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ASTHMA RELATED GENES 



PCT/US98/01260 



Introduction 

Asthma is a disease of reversible bronchial obstruction, characterized by 
5 airway inflammation, epithelial damage, ainway smooth muscle hypertrophy and 
bronchial hyperreactivity. Many asthma symptoms can be controlled by medical 
intervention, but incidence of asthma-related death and severe illness continue to 
rise in the United States. The approximately 4,800 deaths in 1989 marked a 46 
percent increase since 1980. As many as 12 million people in the United States 

10 have asthma, up 66 percent since 1980, and annually, the disease's medical and 
indirect costs are estimated at over $6 billion. 

Two common subdivisions of asthma are atopic (allergic, or extrinsic) asthma 
and non-atopic (intrinsic) asthma. Atopy is characterized by a predisposition to 
raise an IgE antibody response to common environmental antigens. In atopic 

15 asthma, asthma symptoms and evidence of allergy, such as a positive skin test to 
common allergens, are both present. Non-atopic asthma may be defined as 
reversible airflow limitation in the absence of allergies. 

The smooth muscle surrounding the bronchi are able to rapidly alter ainA/ay 
diameter in response to stimuli. When the response is excessive, it is termed 

20 bronchial hyperreactivity, a characteristic of asthma thought to have a heritable 
component. Studies have demonstrated a genetic predisposition to asthma by 
showing, for example, a greater concordance for this trait among monozygotic twins 
than among dizygotic twins. The genetics of asthma is complex, however, and 
shows no simple pattern of inheritance. Environment also plays a role in asthma 

25 development, for example, children of smokers are more likely to develop asthma 
than are children of non-smokers. 

In recent years thousands of human genes have been cloned. In many 
cases, gene discovery has been based on prior knowledge about the corresponding 
protein, such as amino acid sequence, immunological reactivity, ete. This approach 

30 has been very successful, but is limited in some important ways. One limitation is 
that genes in these cases are identified based on knowledge of molecular level 
protein properties. For a large number of important human genes, however, there 
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is little or no biochemical data concerning the encoded product For example, 
genes that predispose to human diseases, such as cystic fibrosis, Huntington's 
disease, etc. are of interest because of their phenotypic effect. Biochemical 
characterization of such genes may be secondary to genetic characterization. 
5 A solution to this impasse has been found in combining classical genetic 

mapping with the ability to identify genes and. if necessary, to sequence large 
regions of chromosomes. Population and family studies enable genes associated 
with a trait of interest to be localized to a relatively small region of a chromosome. 
At this point, physical mapping can be used to identify candidate genes, and 

10 various molecular biology techniques used to pick out mutated genes in affected 
individuals. This 'lop-down" approach to gene discovery has been temied 
positional cloning, because genes are identified based on position in the genome. 

Positional cloning is now being applied to complex genetic diseases, which 
affect a greater fraction of humanity than do the more simple and usually rarer 

15 single gene disorders. Such studies must take into account the contribution of both 
environmental and genetic factors to the development of disease, and must allow 
for contributions to the genetic component by more than one, and potentially many, 
genes. The clinical importance of asthma makes it of considerable interest to 
characterize genes that underlie a genetic predisposition to this disease. Positional 

20 cloning provides an approach to this goal. 

Relevant Literature 

The symptoms and biology of asthma are reviewed in Chanez etaL (1994) 

Odyssey 1:24-33. A review of bronchial hyperreactivity may be found in Smith and 
25 McFadden (1 995) Ann. Allergy. Asthma and Immunol. 74:454. Moss (1 989) Annals 

of Allergy 63:566 review the allergic etiology and immunology of asthma. 

The genetic dissection of complex traits is discussed in Lander and Schork 

(1994) Science 265:2037-2048. Genetic mapping of candidate genes for atopy 

and/or bronchial hyperreactivity is described in Postma a/. (1995) N.E.J.M. 
30 333:894; Marsh et ai (1 994) Science 264: 1 1 52; and Meyers a/. (1 994) Genomics 

23:464. 
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Lawrence et at. (1994) Ann. Hum. Genet . 58:359 discuss an approach to the 
genetic analysis of atopy and asthma. Genetic linkage between the alpha subunit 
of the T cell receptor and IgE reactions has been noted by Moffat et ai (1994) The 
Lancet 343:1597. Caraballo and Hernandez (1990) Tissue Antigens 35:182 noted 
5 an association between HLA alleles and allergic asthma. Evidence of linkage of 
atopy to markers on chromosome 11q has been seen in some British asthma 
families (Cookson et ai (1989) Lancet 1:1292-1295; Young et ai (1991) J. Med. 
Genet. 29:236, but not in other British families (Lympany et al. (1992) Clin. Exp. 
Allergy 22:1085-1092) or in families from Minnesota or Japan (Rich et ai (1992) 
1 0 Clin. Exp. Allergy 22: 1 070-1 076; and Hizawa et ai (1 992) Clin. Exp. Allergy 
22:1065). 

The association of a polymorphism for the FceRI-p gene and risk of atopy is 
described in Hill etai (1995) B.M.J. 311:776; Hill and Cookson (1996) Human MoL 
Genet. 5:959; and Shirakawa etai (1994) Nature Genetics 7:125; an association of 
15 FceRI-p with bronchial hyperreactivity is described in van Herwerden (1995) The 
Lancet 346:1262. 

Collections of polymorphic markers from throughout the human genome 
have been tested for linkage to asthma, described in Meyers et ai (1996) Am. J. 
Hum. Genet . 59:A228 and Daniels et ai (1996) Nature 383:247-250. No linkage to 
20 human chromosome lip was detected in these studies. 



Summary of the Invention 
Human genes associated with a genetic predisposition to asthma are 
provided. The genes, herein temied ASTH1I and ASTH1J, are located close to 
25 each other on human chromosome 1 1 p, have similar patterns of expression, and 
common sequence motife. The nucleic acid compositions are used to produce the 
encoded proteins, which may be employed for functional studies, as a therapeutic, 
and in studying associated physiological pathways. The nucleic acid compositions 
and antibodies specific for the protein are useful as diagnostics to identify a 
30 hereditary predisposition to asthma. 
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Brief Description of the Drawings ' 
Figure 1 : Genomic organization of tlie ASTH1 1 and ASTHU genes. The 
sizes of the exons are not to scale. Alternative exons are hatched. The direction of 
transcription is indicated below each gene. 

5 

Description of the Specific Embodiments 
The provided ASTH1 genes and fragments thereof, encoded protein, ASTH1 
genomic regulatory regions, and anti->4SrH1 antibodies are useful in the 
identification of individuals predisposed to development of asthma, and for the 

1 0 modulation of gene activity in vivo for prophylactic and therapeutic purposes. The 
encoded ASTI-11 protein is useful as an immunogen to raise specific antibodies, in 
drug screening for compositions that mimic or modulate ASTH1 activity or 
expression, including altered forms of ASTti'l protein, and as a therapeutic. 

Asthma, as defined herein, is reversible airflow limitation in a patient over a 

15 period of time. The disease is characterized by increased ainway responsiveness to 
a variety of stimuli, and ainway inflammation. A patient diagnosed as asthmatic will 
generally have multiple indications overtime, including wheezing, asthmatic 
attacks, and a positive response to methacholine challenge, i.e. a PC20 on 
methacholine challenge of less than about 4 mg/ml. Guidelines for diagnosis may 

20 be found in the National Asthma Education Program Expert Panel. Guidelines for 
diagnosis and management of asthma . National Institutes of Health, 1991; Pub. 
#91-3042. Atopy, respiratory infection and environmental predisposing factors may 
also be present, but are not necessary elements of an asthma diagnosis. Asthma 
conditions strictly related to atopy are referred to as atopic asthma. 

25 The human ASTH1I and ASTH1J gene sequences are provided, as are the 

genomic sequences 5* to ASTH1J. The major sequences of interest provided in the 
sequence listing are as follows: 

ASTH1J 5' Genomic Region DNA (SEQ ID N0:1) 

ASTH1J alt1 cDNA (SEQ ID NO:2) 

30 ASTH1J alt2 cDNA (SEQ ID N0:3) 

ASTH1J a\t3 cDNA (SEQ ID N0:4) 
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ASTHU protein 


protein 


(SEQ ID NO:5) 


ASTHIIaM 


cDNA 


(SEQ ID NO:6) 


ASTH Haiti protein 


protein 


(SEQ ID NO:7) 


ASTHIIaKZ 


cDNA 


(SEQ ID N0:8) 


ASTH 11 alt2 protein 


protein 


(SEQ ID N0:9) 


ASTH1I a\t3 


cDNA 


(SEQ ID NO:10) 


ASTHIIaltS protein 


protein 


(SEQ IDN0:11) 


CAATbox"A"form 


DMA 


(SEQ ID N0:12) 


CAAT box "G" form 


DNA 


(SEQ ID NO: 13) 


ASTHU 5' promoter region 


ONA 


(SEQ ID NO: 14} 


Mouse astfity 


cDNA 


(SEQ ID NO:338) 


Mouse asthlj 


protein 


(SEQ ID NO:339) 


Polymorphisms 


DNA 


(SEQ ID NO:16-159) 


Microsatellite flanking sequences 


DNA 


(SEQ ID NO:160-281) 


Microsatellite repeats 


DNA 


(SEQ ID NO:282-292) 


intron-Exon boundaries 


DNA 


(SEQ ID NO:293-335) 



The ASTH1 locus has been mapped to human chromosome 1 1p. The traits 
for a positive response to methacholine challenge and a clinical history of asthma 

20 were shown to be genetically linked in a genome scan of the population of Tristan 
da Cunha, a single large extended family with a high incidence of asthma 
(discussed in Zamel et al (1996) Am. J. Respir. Crit. Care Med, 153:1902-1906). 
The linkage finding was replicated in a set of Canadian asthmatic families. The 
region of strongest linkage was the marker D1 1S907 on the short arm of 

25 chromosome 1 1 . Additional markers were identified from the four megabase region 
surrounding D1 1S907 from public databases and by original cloning of new 
polymorphic microsatellite markers. Refinement of the region of interest was 
obtained by genotyping new markers in the studied populations, and applying the 
transmission disequilibrium test (TDT), which reflects the level of association 

30 between marker alleles and disease status. TDT curves were superimposed on the 
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physical map. Molecular genetic techniques for gene identification were applied to 
the region of interest. A one megabase genomic region was sequenced to high 
accuracy, and the resulting data used for the sequence-based prediction of gen s 
and determination of the intron/exon structure of genes in the region. 

Nucleic Acid Compositions 
ASTH1I produces a 2.8 kb mRNA expressed at high levels in trachea and 
prostate, and at lower levels in lung and kidney and possibly other tissues. ASTH1I 
cDNA clones have also been identified in prostate, testis and lung libraries. 
Sequence polymorphisms are shown In Table 3. ASTH1I has at least three 
altemate forms denoted as altl, alt2, and alt3. The alternative splicing and start 
codons give the three fornns of ASTH1I proteins different amino termini. The 
ASTH1I proteins, alt1, alt2 and altS are 265. 255 and 164 amino acids in length, 
respectively. 

A domain of the ASTH1I and ASTH1J proteins is similar in sequence to 
transcription factors of the ets family. The ets family is a group of transcription 
factors that activate genes involved in a variety of immunological and other 
processes. The family members most similar to ASTH1 1 and ASTH1 J are: ETS1 . 
ETS2, ESX, ELF, ELK1, TEL, NET, SAP-1, NERF and FLI. The ASTH1I and 
ASTH1J proteins show similarity to each other. Over the ets domain they are 66% 
similar (/e. have amino acids with similar properties in the same positions) and 46% 
identical to each other. All forms of ASTH1I and ASTH1J have a helix turn helix 
motif, characteristic of some transcription factors, located near the carboxy terminal 
end of the protein. 

ASTH1J produces an approximately 6 kb mRNA expressed at high levels in 
the trachea, prostate and pancreas and at lower levels in colon, small intestine, lung 
and stomach. ASTH1J has at least three forms, consisting of the alt1, alt2 and altS 
fomis. The open reading frame is identical for the three fomis, which differ only in 
the 5' UTR. The protein encoded by ASTH1J is 300 amino acids in length. 

Mouse coding region sequence of asthlj is provided in SEQ ID NO:326, and 
the amino acid sequence is provided in SEQ ID NO:327. The mouse and human 
proteins have 88.4% identity throughout their length. The match in the ets 
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domain is 100%. The mouse cDNA was identified by hybridization of a full-length 
human cDNA to a mouse lung cDNA library (Stratagene). 

The term "ASTH1 genes" is herein used generically to designate ASTH1I 
and ASTH1J genes and their alternate forms. The two genes lie In opposite 
5 orientations on a native chromosome, with the 5' regulatory sequences between 
them. Part of the genomic sequence between the two coding regions is provided as 
SEQ ID N0:1 . The term "ASTHI locus" is used herein to refer to the two genes in 
ail alternate forms and the genomic sequence that lies between the two genes. 
Alternate forms include splicing variants, and polymorphisms in the sequence. 

10 Specific polymorphic sequences are provided in SEQ ID NOs:16-159. For some 
purposes the previously known EST sequences described herein may be excluded 
from the sequences defined as the ASTH1 locus. 

The DNA sequence encoding ASTH1 may be cDNA or genomic DNA or a 
fragment thereof. The term ""ASTHI gene" shall be intended to mean the open 

15 reading frame encoding specific ASTH1 polypeptides, introns, as well as adjacent 5' 
and 3' non-coding nucleotide sequences involved in the regulation of expression, 
up to about 1 kb beyond the coding region, but possibly further in either direction. 
The gene may be introduced into an appropriate vector for extrachromosomal 
maintenance or for integration into the host 

20 The term "cDNA" as used herein is intended to include all nucleic acids that 

share the arrangement of sequence elements found in native mature mRNA 
species, where sequence elements are exons and 3' and 5' non-coding regions. 
Nomially mRNA species have contiguous exons, with the intervening introns 
removed by nuclear RNA splicing, to create a continuous open reading frame 

25 encoding the ASTH1 protein. 

The genomic ASTH1 sequence has non-contiguous open reading frames, 
where introns interrupt the protein coding regions. A genomic sequence of interest 
comprises the nucleic acid present between the initiation codon and the stop codon, 
as defined in the listed sequences, including all of the introns that are normally 

30 present in a native chromosome. It may further include the 3' and 5' untranslated 
regions found In the mature mRNA. It may further include specific transcriptional 
and translational regulatory sequences, such as promoters, enhancers, etc., 
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including about 1 kb. but possibly more, of flanking genomic DNA at either the 5' or 
3' end of the transcribed region. The genomic DNA may be isolated as a fragment 
of 100 kbp or smaller; and substantially free of flanking chromosomal sequence. 
Genomic regions of interest include the non-transcribed sequences 5' to 
5 ASTH1J, as provided in SEQ ID N0:1. This region of DNA contains the native 
promoter elements that direct expression of the linked ASTH1J gene. Usually a 
promoter region will have at least about 140 nt of sequence located 5' to the ASTH1 
gene and further comprising a TATA box and CAAT box motif sequence (SEQ ID 
NO: 14, nt. 597-736). The promoter region may further comprise a consensus ets 

10 binding motif. (C/A)GGA{AfT) (SEQ ID NO: 14. nt 1-5). A region of particular 

Interest, containing the ets binding motif. TATA box and CAAT box motifs 5' to the 
ASTH1J gene, is provided in SEQ ID NO:14. The position of SEQ ID NO:14 within 
the larger sequence is SEQ ID N0:1. nt 60359-61095. The promoter sequence 
may comprise polymorphisms within the CAAT box region, for example those 

15 shown in SEQ ID NO: 12 and SEQ ID NO: 13. which have been shown to affect the 
function of the promoter. The promoter region of interest may extend 5' to SEQ ID 
N0:14 within the larger sequence, e.g. SEQ ID N0:1. nt 59000-61095; SEQ ID 
NO:1. nt 5700-61 095, etc. 

The sequence of this 5' region, and further 5' upstream sequences and 3' 

20 downstream sequences, may be utilized for promoter elements, including enhancer 
binding sites, that provide for expression in tissues where ASTH1J is expressed. 
The tissue specific expression is useful for determining the pattern of expression, 
and for providing promoters that mimic the native pattern of expression. Naturally 
occurring polymorphisms in the promoter region are useful for detemiining natural 

25 variations in expression, particularly those that may be associated with disease. 
See. for example. SEQ ID NO: 12 and 13. Alternatively, mutations may be 
introduced into the promoter region to determine the effect of altering expression in 
experimentally defined systems. Methods for the identification of specific DNA 
motife involved in the binding of transcriptional factors are known in the art, e.g. 

30 sequence similarity to known binding motifs, gel retardation studies, etc. For 
examples, see Biackwell etaL (1995)MQLM&d 1: 194-205; Mortlockef a/. (1996) 
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Genome Res. 6: 327-33; and Joulin and Richard-Foy (1995) Eur J Biochem 232: 
620-626. 

The regulatory sequences may be used to identify cis acting sequences 
required for transcriptional or translational regulation of ASTH1 expression, 
especially in different tissues or stages of development, and to identify cis acting 
sequences and trans acting factors that regulate or mediate ASTH1 expression. 
Such transcription or translational control regions may be operably linked to a 
ASTH1 gene in order to promote expression of wild type or altered ASTH1 or other 
proteins of interest in cultured cells, or In embryonic, fetal or adult tissues, and for 
gene therapy. 

The nucleic acid compositions of the subject invention may encode all or a 
part of the subject polypeptides. Fragments may be obtained of the DMA sequence 
by chemically synthesizing oligonucleotides in accordance with conventional 
methods, by restriction enzyme digestion, by PGR amplification, etc. For the most 
part. DNA fragments will be of at least 15 nt, usually at least 18 nt, more usually at 
least about 50 nt. Such small DNA fragments are useful as primers for PGR, 
hybridization screening, etc. Larger DNA fragments, i.e. greater than 100 nt are 
useful for production of the encoded polypeptide. For use in amplification reactions, 
such as PGR. a pair of primers will be used. The exact composition of the primer 
sequences is not critical to the invention, but for most applications the primers will 
hybridize to the subject sequence under stringent conditions, as known In the art. It 
is preferable to choose a pair of primers that will generate an amplification product 
of at least about 50 nt, preferably at least about 100 nt. Algorithms for the selection 
of primer sequences are generally known, and are available in commercial software 
packages. Amplification primers hybridize to complementary strands of DNA. and 
will prime towards each other. 

The ASTH1 genes are isolated and obtained in substantial purity, generally 
as other than an intact mammalian chromosome. Usually, the DNA will be obtained 
substantially free of other nucleic acid sequences that do not include an ASTIHI 
sequence or fragment thereof, generally being at least about 50%, usually at least 
about 90% pure and are typically "recombinant", i.e. flanked by one or more 
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nucleotides with which it is not normally associated on a naturally occurring 
chromosome. 

The DNA sequences are used in a variety of ways. They may be used as 
probes for identifying ASTH1 related genes. Mammalian homologs have 
5 substantial sequence similarity to the subject sequences, /.e. at least 75%, usually 
at least 90%, more usually at least 95% sequence identity with the nucleotide 
sequence of the subject DNA sequence. Sequence similarity is calculated based 
on a reference sequence, which may be a subset of a larger sequence, such as a 
conserved motif, coding region, flanking region, etc. A reference sequence will 

10 usually be at least about 18 nt long, more usually at least about 30 nt long, and may 
extend to the complete sequence that is being compared. Algorithms for sequence 
analysis are known in the art, such as BLAST, described in Altschul etal. (1990) J 
IVljciBlol 215:403-10. 

Nucleic acids having sequence similarity are detected by hybridization under 

15 low stringency conditions, for example, at 50°C and 10XSSC (0.9 M saline/0.09 M 
sodium citrate) and remain bound when subjected to washing at 55X in 1XSSC. 
Sequence identity may be determined by hybridization under stringent conditions, 
for example, at 50''C or higher and 0.1XSSC (9 mM saline/0.9 mM sodium citrate). 
By using probes, particularly labeled probes of DNA sequences, one can isolate 

20 homologous or related genes. The source of homologous genes may be any 
species, e.g. primate species, particulariy human; rodents, such as rats and mice, 
canines, felines, bovines, ovines, equines, yeast, Drosophila, Caenhorabditis, etc. 

The DNA may also be used to identify expression of the gene in a biological 
specimen. The manner in which one probes cells for the presence of particular 

25 nucleotide sequences, as genomic DNA or RNA, is well established in the literature 
and does not require elaboration here. mRNA is isolated from a cell sample. 
mRNA may be amplified by RT-PCR. using reverse transcriptase to fomi a 
complementary DNA strand, followed by polymerase chain reaction amplification 
using primers specific for the subject DNA sequences. Alternatively, mRNA sample 

30 is separated by gel electrophoresis, transferred to a suitable support, e.g. 

nitrocellulose, nylon, etc., and then probed with a fragment of the subject DNA as a 
probe. Other techniques, such as oligonucleotide ligation assays, in situ 

-10- 
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hybridizations, and hybridization to DNA probes arrayed on a solid chip may also 
find use. Detection of mRNA hybridizing to the subject sequence is indicative of 
ASTH1 gene expression in the sample. 

The subject nucleic acid sequences may be modified for a number of 
5 purposes, particularly where they will be used intracellularly, for example, by being 
joined to a nucleic acid cleaving agent, e.g. a chelated metal Ion, such as iron or 
chromium for cleavage of the gene; or the like. 

The sequence of the ASTH1 locus, including flanking promoter regions and 
coding regions, may be mutated in various ways known in the art to generate 

10 targeted changes in promoter strength, sequence of the encoded protein, etc. The 
DNA sequence or product of such a mutation will be substantially similar to the 
sequences provided herein, Le. will differ by at least one nucleotide or amino acid, 
respectively, and may differ by at least two but not more than about ten nucleotides 
or amino acids. The sequence changes may be substitutions, insertions or 

15 deletions. Deletions may further include larger changes, such as deletions of a 
domain or exon. Other modifications of interest include epitope tagging, e.g. with 
the FLAG system, HA, etc. For studies of subcellular localization, fusion proteins 
with green fluorescent proteins (GFP) may be used. Such mutated genes may be 
used to study structure-function relationships of ASTH1 polypeptides, or to alter 

20 properties of the protein that affect its function or regulation. For example, 

constitutively active transcription factors, or a dominant negatively active protein 
that binds to the ASTH1 DNA target site without activating transcription, may be 
created in this manner. 

Techniques for in vitro mutagenesis of cloned genes are known. Examples 

25 of protocols for scanning mutations may be found in Gustin et aL, Biotechniques 
14:22 (1993); Barany, Gene 37:111-23 (1985); Colicelli ef a/., MolGen Genet 
199:537-9 (1985); and Prentki ef a/.. Gene 29:303-13 (1984). Methods for site 
specific mutagenesis can be found in Sambrook ef a/., Molecular Cloning: A 
Laboratory Manual, CSH Press 1989, pp. 15.3-15.108; Weiner ef a/., Gene 126:35- 

30 41 (1993); Sayers ef a/., Biotechniques 13:592-6 (1992); Jones and Winistorfer, 
Biotechniques 12:528-30 (1992); Barton ef a/.. Nucleic Acids Res 18:7349-55 
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(1990); Marotti and Tomich, Gene Anal Tech 6:67-70 (1989); and Zhu Anal 
Biochem 177:120-4 (1989). 

Syn thesis ofA $TH1 Proteins 
The subject gene may be employed for synthesis of a complete ASTH1 
5 protein, or polypeptide fragments thereof, particularly fragments corresponding to 
functional domains; binding sites; etc.\ and including fusions of the subject 
polypeptides to other proteins or parts thereof. For expression, an expression 
cassette may be employed, providing for a transcriptional and translational initiation 
region, which may be inducible or constitutive, where the coding region is operabiy 

10 linked under the transcriptional control of the transcriptional initiation region, and a 
transcriptional and translational termination region. Various transcriptional initiation 
regions may be employed that are functional in the expression host. 

The polypeptides may be expressed in prokaryotes or eukaryotes in 
accordance with conventional ways, depending upon the purpose for expression. 

15 For large scale production of the protein, a unicellular organism, such as E. coli, B. 
subtilis, S, cerevisiae, or cells of a higher organism such as vertebrates, particularly 
mammals, e.g. COS 7 cells, may be used as the expression host cells. In many 
situations, it may be desirable to express the ASTH1 gene in mammalian cells, 
where the ASTH1 gene will benefit from native folding and post-translational 

20 modifications. Small peptides can also be synthesized in the laboratory. 

With the availability of the polypeptides in large amounts, by employing an 
expression host, the polypeptides may be isolated and purified in accordance with 
conventional ways. A lysate may be prepared of the expression host and the lysate 
purified using HPLC, exclusion chromatography, gel electrophoresis, affinity 

25 chromatography, or other purification technique. The purified polypeptide will 

generally be at least about 80% pure, preferably at least about 90% pure, and may 
be up to and including 100% pure. Pure is intended to mean free of other proteins, 
as well as cellular debris. 

The polypeptide is used for the production of antibodies, where short 

30 fragments provide for antibodies specific for the particular polypeptide, and larger 
firagments or the entire protein allow for the production of antibodies over the 
surface of the polypeptide. Antibodies may be raised to the wild-type or variant 
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forms of ASTH1. Antibodies may be raised to isolated peptides corresponding to 
these domains, or to the native protein, e.g. by immunization with cells expressing 
ASTH1, immunization with liposomes having ASTH1 inserted in the membrane, etc. 

Antibodies are prepared in accordance with conventional ways, where the 
expressed polypeptide or protein is used as an immunogen, by itself or conjugated 
to known immunogenic carriers, e.g. KLH, pre-S HBsAg, other viral or eukaryotic 
proteins, or the like. Various adjuvants may be employed, with a series of 
injections, as appropriate. For monoclonal antibodies, after one or more booster 
injections, the spleen is isolated, the lymphocytes immortalized by cell fusion, and 
then screened for high affinity antibody binding. The immortalized cells, /.e. 
hybridomas, producing the desired antibodies may then be expanded. For further 
description, see Monoclonal Antibodies: A Laboratory Manual. Hariow and Lane 
eds., Cold Spring Harbor Laboratories, Cold Spring Harbor, New York, 1988. If 
desired, the mRNA encoding the heavy and light chains may be isolated and 
mutagenized by cloning in E. co//, and the heavy and light chains mixed to further 
enhance the affinity of the antibody. Alternatives to in vivo immunization as a 
method of raising antibodies include binding to phage "display" libraries, usually in 
conjunction with in vitro affinity maturation. 

Detection of ASTH1 Associa ted Asthma 

Diagnosis of ASTH1 associated asthma is performed by protein, DNA or 
RNA sequence and/or hybridization analysis of any convenient sample from a 
patient, e.g. biopsy material, blood sample, scrapings from cheek, etc. A nucleic 
acid sample from a patient having asthma that may be associated with ASTH1, is 
analyzed for the presence of a predisposing polymorphism in ASTH1. Atypical 
patient genotype will have at least one predisposing mutation on at least one 
chromosome. The presence of a polymorphic ASTI-11 sequence that affects the 
activity or expression of the gene product, and confers an increased susceptibility to 
asthma is considered a predisposing polymorphism. Individuals are screened by 
analyzing their DNA or mRNA for the presence of a predisposing polymorphism, as 
compared to an asthma neutral sequence. Specific sequences of interest include 
any polymorphism that leads to clinical bronchial hyperreactivity or is othenvise 
associated with asthma, including, but not limited to, insertions, substitutions and 
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deletions in the coding region sequence, intron sequences that affect splicing, or 
promoter or enhancer sequences that affect the activity and expression of the 
protein. Examples of specific ASTH1 polymorphisms in asthma patients are listed 
in Tables 3-8. 

5 The CAAT box polymorphism of SEQ ID NO: 12 and 13 (which is located 

within SEQ ID NO:14) is of particular interest. The "G" form, SEQ ID NO:13. can be 
associated with a propensity to develop bronchial hyperreactivity or asthma. Other 
polymorphisms in the surrounding region affect this association. It has been found 
that substitution of "G" for "A" results in decreased binding of nuclear proteins to the 
10 DNA motif. 

The effect of an ASTH1 predisposing polymorphism may be modulated by 
the patient genotype in other genes related to asthma and atopy, including, but not 
limited to, the Fes receptor, Class I and Class II HLA antigens, T cell receptor and 
immunoglobulin genes, cytokines and cytokine receptors, and the like. 

15 Screening may also be based on the functional or antigenic characteristics of 

the protein. Immunoassays designed to detect predisposing polymorphisms in 
ASTH1 proteins may be used in screening. Where many diverse mutations lead to 
a particular disease phenotype, functional protein assays have proven to be 
effective screening tools. 

20 Biochemical studies may be performed to detemiine whether a candidate 

sequence polymorphism in the ASTH1 coding region or control regions is 
associated with disease. For example, a change in the promoter or enhancer 
sequence that affects expression of ASTH1 may result in predisposition to asthma. 
Expression levels of a candidate variant allele are compared to expression levels of 

25 the normal allele by various methods known in the art. Methods for determining 
promoter or enhancer strength include quantitation of the expressed natural protein; 
insertion of the variant control element into a vector with a reporter gene such as 
p-galactosidase, luciferase, chloramphenicol acetyltransferase, eta that provides 
for convenient quantitation; and the like. The activity of the encoded ASTH1 protein 

30 may be determined by comparison with the wild-type protein. 
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A number of methods are available for analyzing nucleic acids for tiie 
presence of a specific sequence. Where large amounts of DNA are available, 
genomic DNA is used directly. Alternatively, the region of interest is cloned into a 
suitable vector and grown in sufficient quantity for analysis. Cells that express 
ASTH1 genes, such as trachea cells, may be used as a source of mRNA, which 
may be assayed directly or reverse transcribed into cDNA for analysis. The nucleic 
acid may be amplified by conventional techniques, such as the polymerase chain 
reaction (PGR), to provide sufficient amounts for analysis. The use of the 
polymerase chain reaction is described in Saiki, et ai (1985) Science 239:487, and 
a review of current techniques may be found in Sambrook, et al. Molecular Cloning: 
A Labprqtpry Manual, CSH Press 1989, pp.14.2-14.33. Amplification may also be 
used to detemiine whether a polymorphism is present, by using a primer that is 
specific for the polymorphism. Alternatively, various methods are known in the art 
that utilize oligonucleotide ligation as a means of detecting polymorphisms, for 
examples see Riley etal, (1990) N.A.R. 18:2887-2890; and Delahunty etsL (1996) 
Am. J. Hum. Genet. 58:1239-1246. 

A detectable label may be included in an amplification reaction. Suitable 
labels include fluorochromes, e.g. fluorescein isothiocyanate (FITC), rhodamine, 
Texas Red, phycoerythrin, allophycocyanin, 6-carboxyfluorescein (6-FAM), 
2\7'-dimethoxy-4\5'-dichloro-6-carboxyfluorescein (JOE), 6-carboxy-X-rhodamine 
(ROX), 6-carboxy-2\4',7',4.7-hexachlorofluorescein (HEX), 5-carboxyfluorescein 
(5-FAM) or N,N.N',N -tetramethyl-6-carboxyrhodamine (TAMRA). radioactive labels, 
e.g. ^^P, ^S, ^H; ete. The label may be a two stage system, where the amplified 
DNA is conjugated to biotin, haptens, eto. having a high affinity binding partner, e.g. 
avidin, specific antibodies, etc., where the binding partner is conjugated to a 
detectable label. The label may be conjugated to one or both of the primers. 
Alternatively, the pool of nucleotides used in the amplification is labeled, so as to 
incorporate the label into the amplification product. 

The sample nucleic acid, e.g. amplified or cloned fragment, is analyzed by 
one of a number of methods known in the art. The nucleic acid may be sequenced 
by dideoxy or other methods, and the sequence of bases compared to a neutral 
ASTH1 sequence. Hybridization with the variant sequence may also be used to 
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determine its presence, by Southern blots, dot blots, etc. The hybridization pattern 
of a control and variant sequence to an array of oligonucleotide probes immobilised 
on a solid support, as described in US 5,445,934, or in WO95/36505, may also be 
used as a means of detecting the presence of variant sequences. Single strand 
conformational polymorphism (SSCP) analysis, denaturing gradient gel 
electrophoresis (DGGE), mismatch cleavage detection, and heteroduplex analysis 
in gel matrices are used to detect conformational changes created by DNA 
sequence variation as alterations in electrophoretic mobility. Alternatively, where a 
polymorphism creates or destroys a recognition site for a restriction endonuclease 
(restriction fragment length polymorphism, RFLP). the sample is digested with that 
endonuclease, and the products size fractionated to detemiine whether the 
fragment was digested. Fractionation is performed by gel or capillary 
electrophoresis, particularly acrylamide or agarose gels. 

The hybridization pattern of a control and variant sequence to an array of 
oligonucleotide probes immobilised on a solid support, as described in US 
5,445,934, or in WO95/35505, may be used as a means of detecting the presence 
of variant sequences. In one embodiment of the invention, an array of 
oligonucleotides are provided, where discrete positions on the array are 
complementary to at least a portion of mRNA or genomic DNA of the ASTH1 locus. 
Such an array may comprise a series of oligonucleotides, each of which can 
specifically hybridize to a nucleic acid. e.g. mRNA. cDNA. genomic DNA, etc. from 
the ASTH1 locus. 

An array may include all or a subset of the polymorphisms listed in Table 3 
(SEQ ID NOs:16-126). One or both polymorphic fonns may be present In the anray. 
for example the polymorphism of SEQ ID NO: 12 and 13 may be represented by 
either, or both, of the listed sequences. Usually such an array will include at least 2 
different polymorphic sequences, i.e. polymorphisms located at unique positions 
within the locus, usually at least about 5, more usually at least about 10, and may 
include as many as 50 to 100 different polymorphisms. The oligonucleotide 
sequence on the array will usually be at least about 12 nt in length, may be the 
length of the provided polymorphic sequences, or may extend into the flanking 
regions to generate fragments of 100 to 200 nt in length. For examples of arrays. 
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see Hacia et ai (1996) Nature Genetics ^A•AA^'AA^\ Lockharfe/ a/. (1996) Nature 
BiotechnoL 14:1675-1680; and De Risi et al, (1996) Nature Genetics 14:457-460. 

Antibodies specific for ASTH1 polymorphisms may be used in screening 
immunoassays. A reduction or increase in neutral ASTH1 and/or presence of 
5 asthma associated polymorphisms is indicative that asthma is ASTH1 -associated. 
A sample is taken from a patient suspected of having ASTH1 -associated asthma. 
Samples, as used herein, include biological fluids such as tracheal lavage, blood, 
cerebrospinal fluid, tears, saliva, lymph, dialysis fluid and the like; organ or tissue 
culture derived fluids; and fluids extracted from physiological tissues. Also included 

10 in the tenm are derivatives and fractions of such fluids. Biopsy samples are of 

particular interest, e.g. trachea scrapings, etc. The number of cells in a sample will 
generally be at least about 10^, usually at least 10^ more usually at least about 10^ 
The cells may be dissociated, in the case of solid tissues, or tissue sections may be 
analyzed. Alternatively a lysate of the cells may be prepared. 

15 Diagnosis may be performed by a number of methods. The different 

methods all determine the absence or presence or altered amounts of normal or 
abnormal ASTH1 in patient cells suspected of having a predisposing polymorphism 
in ASTH1 . For example, detection may utilize staining of cells or histological 
sections, performed in accordance with conventional methods. The antibodies of 

20 interest are added to the cell sample, and incubated for a period of time sufficient to 
allow binding to the epitope, usually at least about 10 minutes. The antibody may 
be labeled with radioisotopes, enzymes, fluorescers, chemiluminescers, or other 
labels for direct detection. Alternatively, a second stage antibody or reagent is used 
to amplify the signal. Such reagents are well known in the art. For example, the 

25 primary antibody may be conjugated to biotin, with horseradish peroxidase- 
conjugated avidin added as a second stage reagent. Final detection uses a 
substrate that undergoes a color change in the presence of the peroxidase. The 
absence or presence of antibody binding may be detemnined by various methods, 
including flow cytometry of dissociated cells, microscopy, radiography, scintillation 

30 counting, etc. 

An alternative method for diagnosis depends on the in vitro detection of 
binding between antibodies and ASTH1 in a lysate. Measuring the concentration of 
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ASTH1 binding in a sample or fraction thereof may be accomplished by a variety of 
specific assays. A conventional sandwich type assay may be used. For example, a 
sandwich assay may first attach ASTH1 -specific antibodies to an insoluble surface 
or support. The particular manner of binding is not crucial so long as it is 
compatible with the reagents and overall methods of the invention. They may be 
bound to the plates covalently or non-covalently, preferably non-covalently. 

The insoluble supports may be any compositions to which polypeptides can 
be bound, which is readily separated from soluble material, and which is othenvise 
compatible with the overall method. The surface of such supports may be solid or 
porous and of any convenient shape. Examples of suitable insoluble supports to 
which the receptor is bound include beads, e.g. magnetic beads, membranes and 
microliter plates. These are typically made of glass, plastic (e.g. polystyrene), 
polysaccharides, nylon or nitrocellulose. Microtiter plates are especially convenient 
because a large number of assays can be carried out simultaneously, using small 
amounts of reagents and samples. 

Patient sample lysates are then added to separately assayable supports (for 
example, separate wells of a microtiter plate) containing antibodies. Preferably, a 
series of standards, containing known concentrations of normal and/or abnomial 
ASTH1 is assayed in parallel with the samples or aliquots thereof to serve as 
controls. Preferably, each sample and standard will be added to multiple wells so 
that mean values can be obtained for each. The incubation time should be 
sufficient for binding, generally, from about 0.1 to 3 hr is sufficient. After incubation, 
the insoluble support is generally washed of non-bound components. Generally, a 
dilute non-ionic detergent medium at an appropriate pH, generally 7-8, is used as a 
wash medium. From one to six washes may be employed, with sufficient volume to 
thoroughly wash non-specifically bound proteins present in the sample. 

After washing, a solution containing a second antibody is applied. The 
antibody will bind ASTH1 with sufficient specificity such that it can be distinguished 
from other components present. The second antibodies may be labeled to facilitate 
direct, or indirect quantification of binding. Examples of labels that permit direct 
measurement of second receptor binding include radiolabels, such as or ^^^1, 
fluorescers, dyes, beads, chemilumninescers, colloidal particles, and the like. 
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Examples of labels which permit indirect measurement of binding Include enzymes 
where the substrate may provide for a colored or fluorescent product. In a preferred 
embodiment, the antibodies are labeled with a covalently bound enzyme capable of 
providing a detectable product signal after addition of suitable substrate. Examples 
of suitable enzymes for use in conjugates include horseradish peroxidase, alkaline 
phosphatase, malate dehydrogenase and the like. Where not commercially 
available, such antibody-enzyme conjugates are readily produced by techniques 
known to those skilled in the art. The Incubation time should be sufficient for the 
labeled ligand to bind available molecules. Generally, from about 0.1 to 3 hr is 
sufficient, usually 1 hr sufficing. 

After the second binding step, the insoluble support is again washed free of 
non-specifically bound material. The signal produced by the bound conjugate Is 
detected by conventional means. Where an enzyme conjugate is used, an 
appropriate enzyme substrate is provided so a detectable product is formed. 

Other immunoassays are known in the art and may find use as diagnostics. 
Ouchterlony plates provide a simple detemiination of antibody binding. Western 
blots may be performed on protein gels or protein spots on filters, using a detection 
system specific for ASTH1 as desired, conveniently using a labeling method as 
described for the sandwich assay. 

Other diagnostic assays of interest are based on the functional properties of 
ASTH1 proteins. Such assays are particularly useful where a large number of 
different sequence changes lead to a common phenotype, /.e. altered protein 
function leading to bronchial hyperreactivity. For example, a functional assay may 
be based on the transcriptional changes mediated by ASTH1 gene products. Other 
assays may. for example, detect conformational changes, size changes resulting 
from insertions, deletions or truncations, or changes in the subcellular localization of 
ASTH1 proteins. 

In a protein truncation test. PGR fragments amplified from the ASTH1 gene 
or its transcript are used as templates for in vivo transcription/translation reactions 
to generate protein products. Separation by gel electrophoresis is performed to 
detemiine whether the polymorphic gene encodes a truncated protein, where 
truncations may be associated with a loss of function. 
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Diagnostic screening may also be performed for polymorphisms that are 
genetically linked to a predisposition for bronchial hyperreactivity, particularly 
through the use of microsatellite markers or single nucleotide polymorphisms. 
Frequently the microsatellite polymorphism itself is not phenotypically expressed, 
but is linked to sequences that result in a disease predisposition. However, in some 
cases the microsatellite sequence itself may affect gene expression. Microsatellite 
linkage analysis may be perfomned alone, or in combination with direct detection of 
polymorphisms, as described above. The use of microsatellite markers for 
genotyping is well documented. For examples, see Mansfield etal. (1994) 
Genomics 24:225-233; Zlegle etal. (1992) Genomics 14:1026-1031; Dib etal., 
supra. 

Microsatellite loci that are useful in the subject methods have the general 
formula: 

U (R)n U'. where 

U and U' are non-repetitive flanking sequences that uniquely identify the particular 
locus. R is a repeat motif, and n is the number of repeats. The repeat motif is at 
least 2 nucleotides in length, up to 7, usually 2-4 nucleotides in length. Repeats 
can be simple or complex. The flanking sequences U and U' uniquely identity the 
microsatellite locus within the human genome. U and U' are at least about 18 
nucleotides in length, and may extend several hundred bases up to about 1 kb on 
either side of the repeat. Within U and U', sequences are selected for amplification 
primers. The exact composition of the primer sequences are not critical to the 
invention, but they must hybridize to the flanking sequences U and U', respectively, 
under stringent conditions. Criteria for selection of amplification primers are as 
previously discussed. To maximize the resolution of size differences at the locus, it 
is preferable to chose a primer sequence that is close to the repeat sequence, such 
that the total amplification product is between 100-500 nucleotides in length. 

The number of repeats at a specific locus, n, is polymorphic in a population, 
thereby generating individual differences In the length of DNA that lies between the 
amplification primers. The number will vary from at least 1 repeat to as many as 
about 100 repeats or more. 
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The primers are used to amplify tlie region of genomic DNA that contains the 
repeats. Conveniently, a detectable label will be Included in the amplification 
reaction, as previously described. Multiplex amplification may be performed in 
which several sets of primers are combined in the same reaction tube. This is 
5 particularly advantageous when limited amounts of sample DNA are available for 
analysis. Conveniently, each of the sets of primers is labeled with a different 
fluorochrome. 

After amplification, the products are size fractionated. Fractionation may be 
perfonned by gel electrophoresis, particularly denaturing acrylamide or agarose 
10 gels. A convenient system uses denaturing polyacrylamide gels In combination with 
an automated DNA sequencer, see Hunkapillar etal. (1991) Science 254:69-74. 
The automated sequencer is particularly useful with multiplex amplification or 
pooled products of separate PCR reactions. Capillary electrophoresis may also be 
used for fractionation. A review of capillary electrophoresis may be found in 
15 Landers, et al. (1 993) BloTechniques 14:98-1 11. The size of the amplification 
product is proportional to the number of repeats (n) that are present at the locus 
specified by the primers. The size will be polymorphic in the population, and is 
therefore an allelic mari<er for that locus. 

A number of markers in the region of the ASTH1 locus have been identified, 
and are listed in Table 1 in the Experimental section (SEQ ID NOs: 160-273). Of 
particular interest for diagnostic purposes is the mariner D1 1S2008, in which 
individuals having alleles C or F at this locus, particulariy In combination with the 
CAAT box polymorphism and other polymorphisms, are predisposed to develop 
bronchial hypeneactivity or asthma. The association of D1 1S2008 alleles is as 
follows: 

Allele Association with asthma Number of TATC repeats relative to allele 0 



A 
B 
C 
D 
E 
F 
G 
H 



(SEQ ID N0:15) 



no 
no 
yes 
no 



equivalent 



+1 

+2 
+3 
+4 
+5 



-2 
-1 



no 



yes 
no 



no 
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A DNA sequence of interest for diagnosis comprises the D1 1S2008 primer 
sequences shown in Table 1 (SEQ ID NO:242 and 243), flanking one or three 
repeats of SEQ ID NO: 15. 

Other microsatellite markers of interest for diagnostic purposes are CA39_2; 
774F; 774J; 7740; L19PENTA1; 65P14TE1; AFM205YG5; D11S907; D11S4200; 
774N; CA1 1-11; 774L; AFM283WH9; ASMI14 and D1 181900 (primer sequences 
are provided in Table 1, the repeats are provided in Table IB). 

Regulation of ASTH1 Expression 

The ASTH1 genes are useful for analysis of ASTH1 expression, e.g. in 
determining developmental and tissue specific patterns of expression, and for 
modulating expression in vitro and in vivo. The regulatory region of SEQ ID NO:1 
may also be used to investigate analysis olASTHI expression. Vectors useful for 
introduction of the gene include plasmids and viral vectors. Of particular interest 
are retroviral-based vectors, e.g. Moloney murine leukemia virus and modified 
human immunodeficiency virus; adenovirus vectors, etc. that are maintained 
transiently or stably in mammalian cells. A wide variety of vectors can be employed 
for transfection and/or integration of the gene into the genome of the cells. 
Alternatively, micro-injection may be employed, fusion, or the like for introduction of 
genes into a suitable host cell. See, for example, Dhawan et ai (1991) Science 
254:1509-1512 and Smith etaL (1990) Molecular and Cellular Biology 3268-3271. 

Administration of vectors to the lungs is of particular interest. Frequently 
such methods utilize liposomal fomiulations, as described in Eastman etal. (1997) 
Hum GeneTher 8:765-773; Oudrhiri et ai (1 997) P.N.A.S. 94:1651-1656; 
McDonald et ai. (1997) Hum Gene Ther 8:41 1-422. 

The expression vector will have a transcriptional initiation region oriented to 
produce functional mRNA. The native transcriptional initiation region, e.g. SEQ ID 
NO: 14, or an exogenous transcriptional initiation region may be employed. The 
promoter may be introduced by recombinant methods in vitro, or as the result of 
homologous integration of the sequence into a chromosome. Many strong 
promoters are known in the art, including the p-actin promoter, SV40 eariy and late 
promoters, human cytomegalovirus promoter, retroviral LTRs, methallothionein 
responsive element (MRE), tetracycline-inducible promoter constructs, etc. 
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Expression vectors generally have convenient restriction sites located near 
the promoter sequence to provide for the insertion of nucleic acid sequences. 
Transcription cassettes may be prepared comprising a transcription initiation region, 
the target gene or fragment thereof, and a transcriptional temiination region. The 
transcription cassettes may be introduced into a variety of vectors, e.g. plasmid; 
retrovirus, e.g. lentivirus; adenovirus; and the like, where the vectors are able to 
transiently or stably be maintained in the cells, usually for a period of at least about 
one day, more usually for a period of at least about several days to several weeks. 

Antisense molecules are used to down-regulate expression of ASTH1 in 
cells. The anti-sense reagent may be antisense oligonucleotides (ODN). 
particularly synthetic ODN having chemical modifications from native nucleic acids, 
or nucleic acid constructs that express such anti-sense molecules as RNA. The 
antisense sequence is complementary to the mRNA of the targeted gene, and 
inhibits expression of the targeted gene products. Antisense molecules inhibit gene 
expression through various mechanisms, e.g. by reducing the amount of mRNA 
available for translation, through activation of RNAse H, or steric hindrance. One or 
a combination of antisense molecules may be administered, where a combination 
may comprise multiple different sequences. 

Antisense molecules may be produced by expression of all or a part of the 
target gene sequence in an appropriate vector, where the transcriptional initiation is 
oriented such that an antisense strand is produced as an RNA molecule. 
Alternatively, the antisense molecule is a synthetic oligonucleotide. Antisense 
oligonucleotides will generally be at least about 7, usually at least about 12. more 
usually at least about 20 nucleotides in length, and not more than about 500, 
usually not more than about 50, more usually not more than about 35 nucleotides in 
length, where the length is governed by efficiency of inhibition, specificity, including 
absence of cross-reactivity, and the like. It has been found that short 
oligonucleotides, of from 7 to 8 bases in length, can be strong and selective 
inhibitors of gene expression (see Wagner etal, (1996) Nature Biotechnology 
14:840-844). 

A specific region or regions of the endogenous sense strand mRNA 
sequence is chosen to be complemented by the antisense sequence. Selection of 
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a specific sequence for the oligonucleotide may use an empirical method, where 
several candidate sequences are assayed for inhibition of expression of the target 
gene in an in vitro or animal model. A combination of sequences may also be used, 
where several regions of the mRNA sequence are selected for antisense 
complementation. 

Antisense oligonucleotides may be chemically synthesized by methods 
known In the art (see Wagner et ai (1993) supra, and Mllligan et aL, supra.) 
Preferred oligonucleotides are chemically modified from the native phosphodlester 
structure, in order to increase their intracellular stability and binding affinity. A 
number of such modifications have been described in the literature, which alter the 
chemistry of the backbone, sugars or heterocyclic bases. 

Among useful changes in the backbone chemistry are phosphorothioates; 
phosphorodithioates. where both of the non-bridging oxygens are substituted with 
sulfur; phosphoroamidites; alkyi phosphotriesters and boranophosphates. Achiral 
phosphate derivatives include 3'-0 -5 -S-phosphorothioate, 3'-S-5 -O- 
phosphorothioate, 3*-CH2-5'-0-phosphonate and 3 -NH-5'-0-phosphoroamidate. 
Peptide nucleic acids replace the entire ribose phosphodlester backbone with a 
peptide linkage. Sugar modifications are also used to enhance stability and affinity. 
The a-anomer of deoxy ribose may be used, where the base is inverted with respect 
to the natural p-anomer The 2'-OH of the ribose sugar may be altered to form 2 - 
0-methyl or 2-O-allyl sugars, which provides resistance to degradation without 
comprising affinity. Modification of the heterocyclic bases must maintain proper 
base pairing. Some useful substitutions Include deoxyuridine for deoxythymidine; 
5-methyl-2 -deoxycytidine and 5-bromo-2 -deoxycytidine for deoxycytidlne. 5- 
propynyl-2 -deoxyuridine and 5-propynyl-2'-deoxycytidine have been shown to 
increase affinity and biological activity when substituted for deoxythymidine and 
deoxycytidlne. respectively. 

As an alternative to anti-sense inhibitors, catalytic nucleic acid compounds, 
e.g. ribozymes. anti-sense conjugates, eta may be used to inhibit gene expression. 
Ribozymes may be synthesized in vitro and administered to the patient, or may be 
encoded on an expression vector, from which the ribozyme is synthesized in the 
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targeted cell (for example, see International patent application WO 9523225, and 
Beigelman et al. (1995) Nucl. Acids Res 23:4434-42). Examples of oligonucleotides 
with catalytic activity are described in WO 9506764. Conjugates of anti-sense ODN 
with a metal complex, e.g. terpyridylCu(ll), capable of mediating mRNA hydrolysis 
5 are described in Bashkin et al. (1995) AppI Biochem Biotechnol 54:43-56. 

Therapeutic UseofASTHI Protein 
A host may be treated with intact ASTH1 protein, or an active fragment 
thereof to modulate or reduce bronchial hypereactivity. Desirably, the peptides will 
not induce an immune response, particularly an antibody response. Xenogeneic 

10 analogs may be screened for their ability to provide a therapeutic effect without 
raising an immune response. The protein or peptides may also be administered to 
in vitro cell cultures. 

Various methods for administration may be employed. The polypeptide 
formulation may be given orally, or may be injected intravascularly, subcutaneously, 

15 peritoneally, etc. Methods of administration by inhalation are well-known in the art. 
The dosage of the therapeutic formulation will vary widely, depending upon the 
nature of the disease, the frequency of administration, the manner of administration, 
the clearance of the agent from the host, and the like. The initial dose may be 
larger, followed by smaller maintenance doses. The dose may be administered as 

20 Infrequently as weekly or biweekly, or fractionated into smaller doses and 

administered daily, semi-weekly, etc. to maintain an effective dosage level. In many 
cases, oral administration will require a higher dose than if administered 
intravenously. The amide bonds, as well as the amino and carboxy termini, may be 
modified for greater stability on oral administration. 

25 The subject peptides may be prepared as fomiulations at a 

pharmacologically effective dose in phannaceutically acceptable media, for 
example normal saline, PBS, efc. The additives may include bactericidal agents, 
stabilizers, buffers, or the like. In order to enhance the half-life of the subject 
peptide or subject peptide conjugates, the peptides may be encapsulated, 

30 introduced into the lumen of liposomes, prepared as a colloid, or another 

conventional technique may be employed that provides for an extended lifetime of 
the peptides. 
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The peptides may be administered as a combination therapy with other 
pharmacologically active agents. The additional drugs may be administered 
separately or in conjunction with the peptide compositions, and may be included in 
the same formulation. 
5 Models for Asthma 

The subject nucleic acids can be used to generate genetically modified 
non-human animals or site specific gene modifications in cell lines. The term 
"transgenic" is intended to encompass genetically modified animals having a 
deletion or other knock-out oiASTHI gene activity, having an exogenous ASTH1 

10 gene that is stably transmitted in the host cells, or having an exogenous ASTH1 
promoter operably linked to a reporter gene. Transgenic animals may be made 
through homologous recombination, where the ASTH1 locus is altered. 
Alternatively, a nucleic acid construct is randomly integrated into the genome. 
Vectors for stable integration include plasmids, retroviruses and other animal 

15 viruses, YACs. and the like. Of interest are transgenic mammals, e.g. cows, pigs, 
goats, horses, efc, and particularly rodents, e.g. rats, mice, etc. 

A "knock-out" animal is genetically manipulated to substantially reduce, or 
eliminate endogenous ASTH1 function. Different approaches may be used to 
achieve the "knock-out". A chromosomal deletion of all or part of the native ASTH1 

20 homolog may be induced. Deletions of the non-coding regions, particularly the 
promoter region, 3* regulatory sequences, enhancers, or deletions of gene that 
activate expression of ASTH1 genes. A functional knock-out may also be achieved 
by the introduction of an anti-sense construct that blocks expression of the native 
ASTH1 genes (for example, see Li and Cohen (1996) Cell 85:319-329). 

25 Transgenic animals may be made having exogenous ASTH1 genes. The 

exogenous gene is usually either from a different species than the animal host, or is 
othenAfise altered in its coding or non-coding sequence. The introduced gene may 
be a wild-type gene, naturally occurring polymorphism, or a genetically manipulated 
sequence, for example those previously described with deletions, substitutions or 

30 insertions in the coding or non-coding regions. The introduced sequence may 

encode an ASTH1 polypeptide, or may utilize the ASTH1 promoter operably linked 
to a reporter gene. Where the introduced gene is a coding sequence, it usually 
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operably linked to a promoter, which may be constitutive or inducible, and other 
regulatory sequences required for expression in the host animal. 

Specific constructs of interest, but are not limited to, include anti-sense 
ASTH1, which will block ASTH1 expression, expression of dominant negative 
5 ASTH1 mutations, and over-expression of a ASTH1 gene. A detectable marker, 
such as lac Z may be introduced into the ASTH1 locus, where upregulation of 
ASTH1 expression will result in an easily detected change in phenotype. 
Constmcts utilizing the ASTH1 promoter region, e.g. SEQ ID NO:1; SEQ ID NO:14, 
in combination with a reporter gene or with the coding region of ASTH1J or ASTH1I 

1 0 are also of interest. 

The modified cells or animals are useful in the study 0IASTHI function and 
regulation. Animals may be used in functional studies, drug screening, etc., e.g. to 
detemiine the effect of a candidate drug on asthma. A series of small deletions 
and/or substitutions may be made in the ASTH1 gene to determine the role of 

15 different exons in DNA binding, transcriptional regulation, etc. By providing 

expression of ASTH1 protein in cells in which it is otherwise not normally produced, 
one can induce changes in cell behavior These animals are also useful for 
exploring models of inheritance of asthma, e.g. dominant v, recessive; relative 
effects of different alleles and synergistic effects between ASTH1I and ASTH1J and 

20 other asthma genes elsewhere in the genome. 

DNA constructs for homologous recombination will comprise at least a 
portion of the ASTH1 gene with the desired genetic modification, and will include 
regions of homology to the target locus. DNA constructs for random integration 
need not include regions of homology to mediate recombination. Conveniently, 

25 markers for positive and negative selection are included. Methods for generating 
cells having targeted gene modifications through homologous recombination are 
known in the art. For various techniques for transfecting mammalian cells, see 
Keown etal. (1990) Methods in Enzymology 185:527-537. 

For embryonic stem (ES) cells, an ES cell line may be employed, or 

30 embryonic cells may be obtained freshly from a host. e.g. mouse, rat, guinea pig, 
etc. Such cells are grown on an appropriate fibroblast-feeder layer or grown in the 
presence of appropriate growth factors, such as leukemia inhibiting factor (LIF). 
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When ES cells have been transformed, they may be used to produce transgenic 
animals. After transformation, the cells are plated onto a feeder layer in an 
appropriate medium. Cells containing the construct may be detected by employing 
a selective medium. After sufficient time for colonies to grow, they are picked and 
analyzed for the occurrence of homologous recombination or integration of the 
construct. Those colonies that are positive may then be used for embryo 
manipulation and blastocyst injection. Blastocysts are obtained from 4 to 6 week 
old superovulated females. The ES cells are trypsinized, and the modified cells are 
injected into the blastocoel of the blastocyst. After injection, the blastocysts are 
retumed to each uterine horn of pseudopregnant females. Females are then 
allowed to go to term and the resulting litters screened for mutant cells having the 
construct. By providing for a different phenotype of the blastocyst and the ES cells, 
chimeric progeny can be readily detected. 

The chimeric animals are screened for the presence of the modified gene 
and males and females having the modification are mated to produce homozygous 
progeny. If the gene alterations cause lethality at some point in development, 
tissues or organs can be maintained as allogeneic or congenic grafts or transplants, 
or in in vitro culture. 

Investigation of genetic function may utilize non-mammalian models, 
particularly using those organisms that are biologically and genetically 
well-characterized, such as C. e/egans. D. melanogaster and S. cerevisiae. For 
example, transposon (Tc1) insertions in the nematode homolog of an ASTH1 gene 
or promoter region may be made. The subject gene sequences may be used to 
knock-out or to complement defined genetic lesions in order to detemiine the 
physiological and biochemical pathways involved in ASTH1 function. A number of 
human genes have been shown to complement mutations in lower eukaryotes. 

Drug screening may be performed in combination with the subject animal 
models. Many mammalian genes have homologs in yeast and lower animals. The 
study of such homologs' physiological role and interactions with other proteins can 
facilitate understanding of biological function. In addition to model systems based 
on genetic complementation, yeast has been shown to be a powerful tool for 
studying protein-protein interactions through the two hybrid system described in 

-28- 



wo 99/37809 PCTAJS98/01260 

Chien etaL (1991) P.N.A.S. 88:9578-9582. Two-hybrid system analysis is of 
particular interest for exploring transcriptional activation by ASTH1 proteins. 

Drug Screening Assays 
By providing for the production of large amounts of ASTH1 protein, one can 
5 identify ligands or substrates that bind to, modulate or mimic the action of ASTH1. 
Areas of investigation are the development of asthma treatments. Drug screening 
identifies agents that provide a replacement or enhancement for ASTH1 function in 
affected cells. Conversely, agents that reverse or inhibit ASTH1 function may 
stimulate bronchial reactivity. Of particular interest are screening assays for agents 

10 that have a low toxicity for human cells. A wide variety of assays may be used for 
this purpose, including labeled in vitro protein-protein binding assays, protein-DNA 
binding assays, electrophoretic mobility shift assays, immunoassays for protein 
binding, and the like. The purified protein may also be used for determination of 
three-dimensional crystal structure, which can be used for modeling intermolecular 

15 interactions, transcriptional regulation, etc. 

The term "agent" as used herein describes any molecule, e.g. protein or 
pharmaceutical, with the capability of altering or mimicking the physiological 
function of ASTH1. Generally a plurality of assay mixtures are run in parallel with 
different agent concentrations to obtain a differential response to the various 

20 concentrations. Typically, one of these concentrations serves as a negative control, 
/.e. at zero concentration or below the level of detection. 

Candidate agents encompass numerous chemical classes, though typically 
they are organic molecules, preferably small organic compounds having a 
molecular weight of more than 50 and less than about 2,500 daltons. Candidate 

25 agents comprise functional groups necessary for structural interaction with proteins, 
particulariy hydrogen bonding, and typically include at least an amine, carbonyl, 
hydroxyl or carboxyl group, preferably at least two of the functional chemical 
groups. The candidate agents often comprise cyclical carbon or heterocyclic 
structures and/or aromatic or polyaromatic structures substituted with one or more 

30 of the above functional groups. Candidate agents are also found among 
biomolecules including, but not limited to: peptides, saccharides, fatty acids. 



-29- 



wo 99/37809 PCT/US98/01260 

Steroids, purines, pyrimidines, derivatives, structural analogs or combinations 
thereof. 

Candidate agents are obtained from a wide variety of sources including 
libraries of synthetic or natural compounds. For example, numerous means are 
5 available for random and directed synthesis of a wide variety of organic Compounds 
and biomolecules, including expression of randomized oligonucleotides and 
oligopeptides. Alternatively, libraries of natural compounds in the form of bacterial, 
fungal, plant and animal extracts are available or readily produced. Additionally, 
natural or synthetically produced libraries and compounds are readily modified 

10 through conventional chemical, physical and biochemical means, and may be used 
to produce combinatorial libraries. Known pharmacological agents may be 
subjected to directed or random chemical modifications, such as acylation, 
alkylation, esterification, amidification, etc. to produce structural analogs. 

Where the screening assay is a binding assay, one or more of the molecules 

15 may be joined to a label, where the label can directly or indirectly provide a 
detectable signal. Various labels include radioisotopes, fluorescers, 
chemiluminescers, enzymes, specific binding molecules, particles, e.g. magnetic 
particles, and the like. Specific binding molecules include pairs, such as biotin and 
streptavidin, digoxin and antidigoxin etc. For the specific binding members, the 

20 complementary member would normally be labeled with a molecule that provides 
for detection, in accordance with known procedures. 

A variety of other reagents may be included in the screening assay. These 
include reagents like salts, neutral proteins, e.g. albumin, detergents, etc that are 
used to facilitate optimal protein-protein binding and/or reduce non-specific or 

25 background interactions. Reagents that improve the efficiency of the assay, such 
as protease inhibitors, nuclease inhibitors, anti-microbial agents, etc. may be used. 
The mixture of components are added in any order that provides for the requisite 
binding. Incubations are performed at any suitable temperature, typically between 4 
and 40°C. Incubation periods are selected for optimum activity, but may also be 

30 optimized to facilitate rapid high-throughput screening. Typically between 0.1 and 1 
hours will be sufficient. 
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Other assays of interest detect agents that mimic ASTH1 function. For 
example, candidate agents are added to a cell that lacks functional ASTH1, and 
screened for the ability to reproduce ASTH1 in a functional assay. 

The compounds having the desired pharmacological activity may be 
5 administered in a physiologically acceptable carrier to a host for treatment of 
asthma attributable to a defect in ASTH1 function. The compounds may also be 
used to enhance ASTH1 function. The therapeutic agents may be administered in 
a variety of ways, orally, topically, parenterally e.g. subcutaneously, 
intraperitoneally, by viral infection, intravascularly, etc. Inhaled treatments are of 

10 particular interest Depending upon the manner of introduction, the compounds 
may be fomnulated in a variety of ways. The concentration of therapeutically active 
compound In the formulation may vary from about 0.1-100 wt.%. 

The pharmaceutical compositions can be prepared in various forms, such as 
granules, tablets, pills, suppositories, capsules, suspensions, salves, lotions and the 

15 like. Pharmaceutical grade organic or inorganic carriers and/or diluents suitable for 
oral and topical use can be used to make up compositions containing the 
therapeutically-active compounds. Diluents known to the art include aqueous 
media, vegetable and animal oils and fats. Stabilizing agents, wetting and 
emulsifying agents, salts for varying the osmotic pressure or buffers for securing an 

20 adequate pH value, and skin penetration enhancers can be used as auxiliary 
agents. 

Pharmacogenetics 

Phamiacogenetics is the linkage between an individual's genotype and that 
individual's ability to metabolize or react to a therapeutic agent. Differences in 

25 metabolism or target sensitivity can lead to severe toxicity or therapeutic failure by 
altering the relation between bioactive dose and blood concentration of the drug. In 
the past few years, numerous studies have established good relationships between 
polymorphisms in metabolic enzymes or drug targets, and both response and 
toxicity. These relationships can be used to individualize therapeutic dose 

30 administration. 

Genotyping of polymorphic alleles is used to evaluate whether an individual 
will respond well to a particular therapeutic regimen. The polymorphic sequences 
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are also used in drug screening assays, to determine the dose and specificity of a 
candidate therapeutic agent. A candidate ASTH1 polymorphism is screened with a 
target therapy to determine whether there is an influence on the effectiveness in 
treating asthma. Drug screening assays are performed as described above. 
Typically two or more different sequence polymorphisms are tested for response to 
a therapy. 

Drugs currently used to treat asthma include beta 2-agonists, 
glucocorticoids, theophylline, cromones, and anticholinergic agents. For acute, 
severe asthma, the inhaled beta 2-agonists are the most effective bronchodilators. 
Short-acting forms give rapid relief; long-acting agents provide sustained relief and 
help nocturnal asthma. First-line therapy for chronic asthma is inhaled 
glucocorticoids, the only currently available agents that reduce ainway inflammation. 
Theophylline is a bronchodilator that Is useful for severe and nocturnal asthma, but 
recent studies suggest that it may also have an immunomodulatory effect. 
Cromones work best for patients who have mild asthma: they have few adverse 
effects, but their activity is brief, so they must be given frequently. Cysteinil 
leukotrienes are important mediators of asthma, and inhibition of their effects may 
represent a potential breakthrough in the therapy of allergic rhinitis and asthma. 

Where a particular sequence polymorphism correlates with differential drug 
effectiveness, diagnostic screening may be performed. Diagnostic methods have 
been described in detail in a preceding section. The presence of a particular 
polymorphism is detected, and used to develop an effective therapeutic strategy for 
the affected individual. 

Experimental 

The following examples are put forth so as to provide those of ordinary skill 
in the art with a complete disclosure and description of how to make and use the 
subject invention, and are not intended to limit the scope of what is regarded as the 
invention. Efforts have been made to ensure accuracy with respect to the numbers 
used (e.g, amounts, temperature, concentrations, etc.) but some experimental 
errors and deviations should be allowed for. Unless othenA/ise indicated, parts are 
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parts by weight molecular weight is average molecular weight: temperature is in 
degrees centigrade; and pressure is at or near atmospheric. 



MATERIALS AND METHODS 
5 Asthma families for genetic mapping studies 

Asthma phenotype measurements and blood samples were obtained from 
the inhabitants of Tristan da Cunha, an isolated island in the South Atlantic, and 
from asthma families in Toronto, Canada (see Zamel etal., (1996) supra.) The 282 
inhabitants of Tristan da Cunha form a single large extended family descended from 

10 28 original founders. Settlement of Tristan da Cunha occurred beginning in 1817 
with soldiers who remained behind when a British garrison was withdrawn from the 
island, followed by the survivors of several shipwrecks. In 1827 five women from 
St. Helena, one with children, emigrated to Tristan da Cunha and married island 
men. One of these women is said to have been asthmatic, and could be the origin 

15 of a genetic founder effect for asthma in this population. Inbreeding has resulted in 
kinship resemblances of at least first cousin levels for all individuals. 

The Tristan da Cunha family pedigrees were ascertained through review of 
baptismal, marriage and medical records, as well as reliably accurate historical 
records of the early inhabitants (Zamel (1995) Can. Respir. J. 2:18). The 

20 prevalence of asthma on Tristan da Cunha is high; 23% had a definitive diagnosis 
of asthma. 

The Toronto cohort included 59 small families having at least one affected 
individual. These were ascertained based on the following criteria: (i) an affected 
proband; (ii) availability of at least one sibling of the proband, either affected or 
25 unaffected; (iii) at least one living parent from whom DNA could be obtained. A set 
of 156 "triad" families consisting of an affected proband and his or her parents were 
also collected. Signed consent fomns were obtained from each individual prior to 
commencement of phenotyping and blood sample collection. The Toronto patients 
were mainly of mixed European ancestry. 

30 
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Clinical characterization 

A standardized questionnaire based on that of the American Thoracic 
Society (American Lung Association recommended respiratory diseases 
questionnaire for use with adults and children in epidemiology research. 1978. 
5 American Review of Respiratory Disease 1 1 8(2):7-53) was used to record the 
presence of respiratory symptoms such as cough, sputum and wheezing; the 
presence of other chest disorders including recent upper respiratory tract infection, 
allergic history; asthmatic attacks including onset, offeet, confinnation by a 
physician, prevalence, severity and precipitating factors; other illnesses and 
10 smoking history; and all medications used within the previous 3 months. A 

physician-confirmed asthmatic attack was the principal criterion for a diagnosis of 
asthma. 

Skin atopy was determined by skin prick tests to common allergens: 
A. fumigatus, Cladosporium, Altemaria, egg, milk, wheat, tree, dog, grass, horse, 

15 house dust, cat, feathers, house dust mite D. farinae, and house dust mite 

D, pteronyssinus. Atopy testing of Toronto subjects omitted D. pteronyssinus and 
added cockroach and ragweed allergens. Saline and histamine controls were also 
performed (Bencard Laboratories, Mississauga, Ontario). Antihistamines were 
withdrawn for at least 48 hours prior to testing. Wheal diameters were corrected by 

20 subtraction of the saline control wheal diameter, and a corrected wheal size of >3 
mm recorded 10 min after application was considered a positive response. 

Airway responsiveness was assessed by a methacholine challenge test in 
those subjects with a baseline FEV1 (forced exhalation volume in one second) > 
70% of predicted (Crapo et al. (1981) Am. Rev. Respir Pis . 123:659). 

25 Methacholine challenge response was determined using the tidal breathing method 
(Cockcroft et al. (1 977) Clin. Allergy 7:235). Doubling doses of methacholine from 
0.03 to 16 mg/ml were administered using a Wright nebulizer at 4-min intervals to 
measure the provocative concentration of methacholine producing a 20% fall in 
FEV1 (PC20). If FEV1 was <70% of predicted, a bronchodilator response to 400 

30 mg salbutamol aerosol was used to determine airway responsiveness. Both 
methacholine challenges and bronchodilator responses were measured using a 
computerized bronchial challenge system (S&M Instrument Co. Inc.. Doyleston, PA) 
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consisting of a software package and interface board installed in a Toshiba T1850C 
laptop computer and connected to a flow sensor (RS232FS). The power source for 
instruments used on Tristan da Cunha has been described (Zamel etaL (1996) 
supra.) Increased ainway responsiveness was defined as a PC20 < 4.0 mg/ml or a 
5 > 15% improvement in FEV1 15 min postbronchodilator. Participants were asked to 
withhold bronchodilators at least 8 h before testing; inhaled or systemic steroids 
were maintained at the usual dosage. Subjects with a history of an upper 
respiratory tract infection within a month of testing were rechallenged at a later date. 

10 Genotyping 

PCR primer pairs were synthesized using Applied Biosystems 394 
automated oligo synthesizer. The fonA^ard primer of each pair was labeled with 
either FAM. HEX, or TET phosphoramidites (Applied Biosystems). No oligo 
purification step was performed. 

15 Genomic DNA was extracted from whole blood. PCR was performed using 

PTC100 thermocyclers (MJ Research). Reactions contained 10 mM Tris-HCI, pH 
8.3; 1.5-3.0 mM MgClz; 50 mM KCI; 0.01% gelatin; 250 jaM each dGTP, dATP, 
dTTP, dCTP; 20 jiM each PCR primer; 20 ng genomic DNA; and 0.75 U Taq 
Polymerase (Perkin Elmer Cetus) in a final volume of 20 ^il Reactions were 

20 perfomned in 96 well polypropylene microliter plates (Robbins Scientific) with an 
initial 94'*C, 3 min. denaturation followed by 35 cycles of 30 sec. at 94''C, 30 sec. at 
the annealing temp., and 30 sec. at 72X, with a final 2 min. extension at 72*'C 
following the last cycle. Dye label, annealing temperature, and final magnesium 
concentration were specific to the individual marker. 

25 Dye label intensity and quantity of PCR product (as assessed on agarose 

gels) were used to determine the amount to be pooled for each marker locus. The 
pooled products were precipitated and the product pellets mixed with 0.4 |il 
Genescan 500 Tamra size standard. 2 nl formamide, and 1 |il ABI loading dye. 
Plates of PCR product pools were heated to 80°C for 5 minutes and immediately 

30 placed on ice prior to gel loading. 
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PGR products were electrophoresed on denaturing 6%'polyacrylamide gels 
at a constant 1000 volts using ABI 373a instruments. Peak detection, sizing, and 
stutter band filtering were acfiieved using Genescan 1 .2 and Genotyper 1.1 
software (Applied Biosystems). Genotype data were subsequently submitted to 
quality control and consistency checks (Hall etal. (1996) Genome Res . 6:781). 

Genotyping of 'saturation' markers in the ASTH1 region was done by the 
method described above with several exceptions. In most cases, the unlabeled 
primer of each pair was modified with the sequence GTTTCTT at the 5' end (Smith 
et ai 1995 Genome Res . 5:312). Amplitaq Gold (Perkin Elmer Cetus) and buffer D 
(2.5 mM MgCIa, 33.5 mM Tris-HCI pH 8.0, 8.3 mM (NH4)2S04. 25 mM KCI. 85 |jg/ml 
BSA) were used in the PGR. A 'touchdown' amplification profile was employed in 
which the annealing temperature began at Se^'G and decreased one degree per 
cycle to a final 20 cycles at 56X. Products were run on 4.25% polyacrylamide gels 
using ABI 377 instruments. The data was processed with Genescan 2.1 and 
Genotyper 1.1 software. 

The Genome Scan 

A genome scan was performed in the population of Tristan da Cunha using 
274 polymorphic microsatellite markers chosen from among those developed at 
Oxford (Reed et ai (1994) Nature Genetics 7:390), Genethon (Dib et ai (1996) 
Nature 380:152) and the Cooperative Human Linkage Center (CHLC, Murray etai 
(1994) Science 265:2049). Markers with heterozygosity values of 0.75 or greater 
were selected to cover all the human chromosomes, as well as for ease of 
genotyping and size of PGR product for multiplexing of markers on gels. Fifteen 
multiplexed sets were used to provide a ladder of PGR products in each of three 
dyes when separated by size. Published distances were used Initially to estimate 
map resolution. More accurate genetic distances were calculated using the study 
population as the data was generated. The 274 markers gave an average 14 cM 
interval for the genome scan. 
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Linkage analysis 

Parametric linkage analyses of marker data were conducted using the 
methods of Haseman and Elston (1972) Behav. Genet. 2:3, and FASTLINK 
(Schaffer et al, (1996) Hum. Hered . 46:226), assuming a dominant mode of 
transmission with incomplete penetrance. Linkage to three primary phenotypes 
including asthma diagnosis (history), airway responsiveness (PC20 < 4 mg/ml for 
methacholine challenge) and atopy (one or more skin-prick test which yielded a 
wheal diameter > 3 mm) and combinations of these, were tested. 

Small scale yeast artificial chromosome (YA C) DNA preparation 

Small scale isolation of YAC DNA for STS mapping was done by a procedure 
which uses glass beads and physical shearing to damage the yeast cell wall 
(Scherer and Tsui (1991) Cloning and analysis of large DNA molecules . In 
Advanced Techniques in Chromosome Research. (K.W. Adolph, ed.) pp. 33-72. 
Marcel Dekker, Inc. New York, Basel, Hong Kong.) 

YAC block prep and pulsed field gel electroplioresis (PFGE) 

A 50 ml culture of each YAC was grown in 2 x AHC at 30*'C. The cells were 
pelleted by centrifugation and washed twice in sterile water After resuspension of 
the cells in 4 ml of SCEM (1 M sorbitol, 0.1 M sodium citrate (pH 5.8), 10 mM 
EDTA, 30 mM p-mercaptoethanol), 5 ml of 1.2% low melting temperature agarose 
in SCEM was added, mixed, pipetted into 100 ml plug molds and allowed to solidify. 
Plugs were incubated overnight in 50 ml of SCEM containing 30 U/ml lyticase 
(Sigma). Plugs were rinsed 3 times in TE (10 mM Tris pH 8.0, 1 mM EDTA) and 
incubated twice for 12 hours each at 50''C in lysis solution (0.5 M EDTA, pH 8.0; 
1% w/v sodium lauryl sarcosine; 0.5 mg/ml proteinase K). They were washed 5 
times with TE and stored in 0.5 M EDTA (pH 8.0) at 4X. 

YACs and yeast chromosomes were separated on pulsed field gels using a 
CHEF Mapper (BIO-RAD) and according to methods supplied by the manufacturer, 
then transferred to nitrocellulose. YACs which comigrated with yeast chromosomes 
were visualized by hybridization of the blot with radiolabelled YAC vector 
sequences (Scherer and Tsui (1991) supra.) 
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Hybridization of YAC DNA to bacterial artificial chromosome (BAC) and cosmid 
grids 

Size-purified YAC DNA was prepared by pulsed field gel electrophoresis on a 
low melting temperature Seaplaque GTG agarose (FMC) gel. purified by 
5 GeneClean (BIO101) and radiolabeled for 30 mins with ^^P-dCTP using the Prime-It 
II kit (Stratagene). 50 (il of water was added and unincorporated nucleotide was 
removed by Quick Spin Column (Boehringer Mannheim). 23 ^1 of 11.2 mg/ml 
human placental DNA (Sigma) and 36 ^1 of 0.5 M Na2HP04. pH 6.0 were added to 
the approximately 150 ^1 of eluant. The probe was boiled for 5 mins and incubated 

10 at 65''C for exactly 3 hours, then added to the prehybridized gridded BAC (Shizuya 
ef aL (1992) Proc. Natl. Acad. Scl. 89:8794; purchased from Research Genetics) or 
chromosome 11 cosmid [Resource Center/ Primary Database of the German 
Human Genome Project, Berlin; Lehrach etal, (1990), In Davies, K.E. and 
Tilghman, S.M. feds.V Genome Analysis V olume 1: Genetic and Physical Mapping. 

15 Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp. 39-81] filters in 
dextran sulfate hybridization mix (10% dextran sulfate, 1% SDS, 1 M NaCI). 
Hybridizations were at 65**C for 12 - 48 hours, followed by 2 washes at room 
temperature in 2x SSC for 10 mins each, and 3 washes at 65*'C in 0.2X SSC. 0.2% 
SDS for 20 mins each. 

20 

Metaphase fluorescence in situ hybridization (FISH) and direct visual in situ 
hybridisation (DIRVISH) 

Metaphase FISH was canried out by standard methods (Heng and Tsui 
(1994) FISH detection on DAPI banded chromosomes. In Methods of Molecular 

25 Biology: In Situ Hybridisation Protocols (K.H.A. Choo. ed.) pp. 35-49. Human Press, 
Clifton. N.J.). High resolution FISH, or DIRVISH, was used to map the relative 
positions of two or more clones on genomic DNA. The protocol used was as 
described by Parra and Windle (1993) Nature Genet . 5:17. Briefly, slides 
containing stretched DNA were prepared by adding 2 |liI of a suspension of normal 

30 human lymphoblast cells at one end of a glass slide and allowing to dry. 8 ^1 lysis 
buffer (0.5% SDS. 50 mM EDTA, 200 mM Tris-HCL, pH 7.4) was added and the 
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slide incubated at room temperature for 5 minutes. The slide was tilted so that the 
DNA ran down the slide, then dried. The DNA was fixed by adding 400 |il 3:1 
methanol/acetic acid. Probes were labeled either with biotin or with digoxygenin by 
standard nick translation (Rigby et al. (1977) J. MoL BioL 113:237). Hybridization 
and detections were carried out using standard fluorescence in situ hybridization 
techniques (Heng and Tsui (1994) supra.). Results were visualised using a 
Mikrophot SA microscope (Nikon) equipped with a CCD camera (Photometries). 
Images were recorded using Smartcapture software (Vysis). 

Gap filling 

Clones flanking gaps in the map were end cloned by digestion with enzymes 
that do not cut the respective vector sequences (Nsil for BAG clones and Xbal for 
PAC clones), followed by religation and transformation into competent DH5a. 
Clones which produced two end fragments and plasmid vector upon digestion with 
NotI and Nsil or Xbal were sequenced. Gaps in the tiling path were filled by 
screening a gridded BAC library with the end clone probes or by screening DNA 
pools of a human genomic PAC library (loannou etal, (1994) Nature Genetics 6:84; 
licensed from Health Research, Inc.) by PCR using primers designed from end 
clone sequences. 

Direct cDNA selection 

Direct cDNA selection (Lovett et aL, (1991) Proc. Natl. Acad. Sci. 88:9628) 
was canied out using cDNA derived from both adult whole lung tissue and fetal 
whole lung tissue (Clontech). 5 ng of Poly(A)+ RNA was converted to double 
stranded cDNA using the Superscript Choice System for cDNA synthesis and the 
supplied protocol (Gibco BRL). First strand priming was achieved by both oligo(dT) 
and random hexamers. The resulting cDNA was split into 2 equal aliquots and 
digested with either Mbol or TaqI prior to the addition of specific linker primers. 
Linker primers for Mbol-digested DNA were as described by Morgan et al, (1992) 
Nucleic Acid Res. 20:5173. Linker primers for Taql-digested DNA were a 
modification of these: 
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(SEQ ID NO:336 ) Taqla: 5'-CGAGAATTCACTCGAGCATCAGG; 
(SEQ ID NO:337 ) Taq1b: 5'-CCTGATGCTCGAGTGAATTCT. The modified cDNA 
was ethanol precipitated and resuspended in 200 ^l of HjO. 1 jxl of cDNA was 
amplified with the linker primer Mbolb in a 100 |il PGR reaction. The resulting 
cDNA products, approximately 1 ^g, were blocked with 1 |ag of COT1 DNA (Gibco 
BRL) for 4 hours at 60°C in 120 mM NaP04 buffer, pH 7.0. 

Approximately 1 ng of the appropriate genomic clones was biotinylated using 
the BloNIck Labeling System (Gibco BRL). Unincorporated biotin was removed by 
spin column chromatography. Approximately 100 ng of biotinylated genomic DNA 
was denatured and allowed to hybridize to 1 ng of blocked cDNA in a total volume 
of 20 \x\ in 120 mM NaP04 for 60 hours at 60°C under mineral oil. After 
hybridization, the biotinylated DNA was captured on streptavidin-coated magnetic 
beads (Dynal) in 100 ^il of binding buffer (1 M NaCI, 10 mM Tris, pH 7.4, 1 mM 
EDTA) for 20 minutes at room temperature with constant rotation. Two 15 minute 
washes at room temperature with 500 ^1 of IX SSC/0.1% SDS were followed by 
four washes for 20 minutes at 65°C with 500 ul of 0.1X SSC/0.1% SDS with 
constant rotation. After each wash, the beads were collected on the side of the tube 
using magnet separation and the supernatant was removed with a pipette. 
Following the last wash, the beads were briefly rinsed once with wash solution prior 
to eluting the bound cDNA with 50 jxl of 0.1 M NaOH for 10 minutes at room 
temperature. The supernatant was removed and neutralized with 50 jxl 1 M Tris pH 
7.4. The primary selected cDNA was desalted using a Sephadex G-50 column 
(Boehringer Mannheim). PGR was perfomied on 1, 2, 5, and 10 ^il of eluate with 
Mbolb primers. Amplified products were analyzed on a 1.4% agarose gel. The 
reaction with the cleanest bands and least background was scaled up to produce 
approximately 1 |ig of primary selected cDNA. This amplified primary selected 
cDNA was blocked with 1 ixg of C0T1 at 60^*0 for 1 hour followed by a second 
round of hybridization to 100 ng of the appropriate genomic DNA under the same 
conditions as the first round of selection. Washing of the bound cDNA, elution, and 
PGR of the selected cDNA was identical to the first round. 1 ^1 of PGR amplified 
secondary selected cDNA was cloned using the TA cloning system according to the 
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manufecturers protocol (Invitrogen). Colonies were picked into 96-well microtiter 
plates and grown overnight prior to sequencing. 



Exon Trapping 

Exon trapping was performed by the method of Buckler et ai (1991, Proc. 
Natl. Acad. Sci. USA 88:4005) with modifications described in Church et ai, (1994) 
Nature Genetics 6:98. Each BAC clone of the minimal set of clones required to the 
cover the ASTH1 region (/.e. the tiling path) was subject to exon trapping 
separately. Briefly, restriction fragments (PstI or BamHI/Bglll) of each cosmid were 
shotgun subcloned into PstI- or BamHI-digested and phosphatase-treated pSPL3B 
which had been modified as in Burns et ai (1995) Gene 161:183 (GIBCO BRL). 
Ligations were eiectroporated into ElectroMax HB101 cells (Gibco BRL) and plated 
on 20 cm diameter LB ampicillin plates. DNA was prepared from plates with > 2000 
colonies by collection of the bacteria in LB ampicillin liquid and plasmid DNA 
purification by a standard alkaline lysis protocol (Sambrook et a/. (1989) supra.) 5 
^g of DNA from each plasmid pool preparation were eiectroporated into Cos 7 cells 
(ATCC) and RNA harvested using TRIZOL (Gibco BRL) after 48 hours of growth. 
RT-PCR products were digested with BstXI prior to a second PCR amplification. 
Products were cloned into pAMPIO (Gibco BRL) and transformed into DH5 cells 
(Gibco BRL). 96 colonies per BAC were picked and analyzed for insert size by 
PCR. 

Northern blot hybridisation 

Northern hybridisation was performed using Multiple Tissue Northern (MTN) 
blots (Clontech). DNA probes were radioactively labeled by random priming 
[Feinberg and Vogelstein (1984) Anal. Biochem . 137:266] using the Prime-It II kit 
(Stratagene). Hybridizations were performed in ExpressHyb hybridisation solution 
(Clontech) according to the manufacturer's recommendations. Filters were exposed 
to autoradiographic film overnight or for 3 days. 
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cDNA library screening 

Phage cDNA libraries were plated and screened with radiolabeled probes 
(exon trapping or cDNA selection products amplified by PGR from plasmids 
containing these sequences) by standard methods (Sambrook at aL (1989) supra.) 

Rapid amplification ofcDNA ends (RACE) 

RACE libraries were constructed using polyA+ RNA and the Marathon cDNA 
amplification kit (Clontech). Nested RACE primer sets were designed for each 
cDNA or potential gene fragment (trapped exon. predicted exon. conserved 
fragment, efc). The RACE libraries were tested by PCR using one primer pair for 
each potential gene fragment; the two strongly positive libraries were chosen for 
RACE experiments. 

Genomic sequencing 

DNA from cosmid. PAC, and BAC clones was prepared using Qiagen DNA 
prep kits and further purified by CsCI gradient. DNA was sonicated and DNA 
fragments were repaired using nuclease BAL-31 and T4 DNA polymerase. DNA 
fragments of 0.8-2.2 kb were size-fractionated by agarose gel electrophoresis and 
ligated into pUC9 vector. Inserts of the plasmid clones were amplified by PCR and 
sequenced using standard ABI dye-primer chemistry. 

ABI sample file data was reanalyzed using Phred (Phil Green, University of 
Washington) for base calling and quality analysis. Sequence assembly of 
reanalyzed sequence data was accomplished using Phrap (Phil Green, University 
of Washington). Physical gaps between assembled contigs and unjoined but 
overiapping contigs were identified by Inspection of the assembled data using GFP 
(licensed from Baylor College of Medicine) and Consed (Phil Green, University of 
Washington). Material for sequence data generation across gaps was obtained by 
PCR amplification. Low coverage regions were resequenced using dye-primer and 
dye-terminator chemistries (ABI). Final base-perfect editing (to > 99% accuracy) 
was accomplished using Consed. 
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Single stranded confonvational polymorphism (SSCP) analysis - 

PGR primers flanking each exon of the ASTH1I and ASTH1J genes, or more 
than one primer pair for large exons, were designed from genomic sequence 
generated using Primer (publicly available from the Whitehead Institute for 
Biomedical Research) or Oligo 4.0 (licensed from National Biosciences). 
Radioactive SSCP was performed by the method of Orita et al. (1989, Proc. Natl. 
Acad. Sci . 86:2766). Briefly, radioactively labeled PGR products between 150 and 
300 bp and spanning exons of the ASTH1I and ASTH1J genes were generated 
from a set of asthma patient and control genomic template DNAs, by incorporating 
a-^^P dGTP in the PGR. PGR reactions (20 nl) included 1x reaction buffer, 100 ^iM 
dNTPs, 1 each fonA^ard and reverse primer, and 1 unit Taq DNA polymerase 
(Perkin-Elmer) and 1 ^iGi a-^^P dGTP. A brief denaturation at 94°G was followed by 
30-32 cycles of: 94X for 30 sec, 30 sec at the annealling temperature, and 72**G for 
30 sec; followed by 5 mins at 72^. Radiolabeled PGR products were diluted 1:20 in 
water, mixed with an equal volume of denaturing loading dye (95% formamide, 
0.25% bromophenol blue), and denatured for 10 minutes at 80°G immediately prior 
to electrophoresis. 0.5x MDE (FMG) gels with and without 8% glycerol in 1x TBE 
were run at 8-12 Watts for 16-20 hours at room temperature. Dried gels were 
exposed to autoradiographic film (Kodak XAR) for 1-2 days at -80°G. PGR products 
from individuals carrying SSGP variants were subcloned into the PGR2.1 or 
pZeroBlunt plasmid vector (Invitrogen). Inserts of the plasmid clones were amplified 
by PGR and sequenced using standard ABI dye-primer chemistry to determine the 
nature of the sequence variant responsible for the conformational changes detected 
by SSGP. 

Fluorescent SSGP was carried out according to the recommended ABI 
protocol (ABI User Bulletin entitled 'Multi Golor Fluorescent SSGP). Unlabeled 
PGR primers were used to amplify genomic DNA segments containing different 
exons of the ASTH1 1 or ASTH1J genes, in patient or control DNA. Nested 
fluorescently labeled (TET, FAM or HEX) primers were then used to amplify smaller 
products, 150 to 300 bp containing the exon or region of interest. Amplification was 
done using a 'touchdown' PGR protocol, in which the annealing temperature 



-43- 



wo 99/37809 PCTAJS98/01260 

decreased from 57°C to 42 ^C. and Amplitaq Gold polymerase fPerkin Elmer, 
Cetus). In most cases the fluorescently labeled primers were identical in sequence 
to those used for conventional radioactive SSCP. The fluorescent PGR products 
were diluted and mixed with denaturing agents, GeneScan size standard 
(Genescan 500 labelled with Tamra) and Blue dextran dye. Samples were heated 
at 90''C and quick chilled on ice prior to loading on 6.5% standard or 0.5 X MDE 
(manufacturer) polyacrylamide gels containing 2.5% glycerol and run using 
externally temperature controlled modified ABI 377 Instruments. Gels were run at 
1240V and 20 °C for 7-9 hrs and analyzed using GeneScan software (ABI). 

Comparative (heterozygote detection) sequencing 

Unlabeled PCR primers were used to amplify genomic DNA segments 
containing different exons of the ASTH1I or ASTH1J genes, from patient or control 
DNAs. A set of nested PCR primers was then used to reamplify the fragment. 
Unincorporated primers were removed from the PCR product by Centricon-100 
column (Amicon), or by Centricon-30 column for products less than 130 bp. The 
nested primers and dye terminator sequencing chemistry (ABI PRISM dye 
terminator cycle sequencing ready reaction kit) were then used to cycle sequence 
the exon and flanking region. Volumes were scaled down to 5 jil and 10% DMSO 
added to increase peak height uniformity. Sequences were compared between 
samples and heterozygous positions detected by visual inspection of 
chromatograms and using Sequence Navigator (licensed from ABI). 

For some exons, PCR products were also compared by subcloning and 
sequencing, and comparison of sequences for ten or more clones. 

RE3Um $ 

Genome scanning and linloge analysis 

A genome scan was performed using polymorphic microsatellite markers 
from throughout the human genome, and DNA isolated from blood samples drawn 
from the inhabitants of Tristan da Cunha. Linkage analysis, an established 
statistical method used to map the locations of genes and markers relative to other 
markers, was applied to verify the marker orders and relative distances between 
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markers on all human chromosomes, in the Tristan da Cunha population. Linkage 
analysis can detect cosegregation of a marker with disease, and was used as a 
means to detect genes influencing the development of asthma in this population. 
The most highly significant linkage in the genome scan (p = 0.0001 for history of 
5 asthma and p = 0.0009 for methacholine challenge) was obtained at D1 1S907, a 
marker on the short arm of chromosome 1 1 . This significant linkage result indicated 
that a gene influencing predisposition to asthma In the Tristan da Cunha population 
was located near D1 1S907. 

Replication of this finding was obtained in a collection of asthma femilies 

10 from Toronto, in which 0118907 and several nearby markers were tested for 

linkage. The significant linkage seen (p = 0.001 for history of asthma and p = 0.05 
for methacholine challenge) supported the mapping of an asthma gene near 
D1 1S907 and indicated that the gene was likely to be relevant in the more diverse 
outbred Toronto group as well as in the inbred population of Tristan da Cunha. 

15 The approximate genetic location of the ASTH1 gene in the Tristan da 

Cunha population was confirmed by genotyping and analyzing data from several 
markers near D1 1S907, spaced at intervals no greater than 5 cM across a possible 
linked region of about 30 cM. Sib-pair and affected pedigree member linkage 
analyses of these markers yielded confinnatory evidence for linkage and refined the 

20 genetic interval. 



Physical mapping at ASTH1: YAC contig construction 

Yeast artificial chromosome (YAC) clones were derived from the CEPH 
megaYAC library (Cohen et at. 1993 Nature 366:698). Individual YAC addresses 

25 were obtained from a public physical map of CEPH megaYAC STS (sequence 

tagged site; Olson etal (1989) Science 245:1434) mapping data maintained by the 
Whitehead Institute and accessible through the world wide web (Cohen etal, 1993. 
supra,\ http://www-genome.wi.mit.edu/cgi-bin/contig/phys__map). YAC clones 
spanning or overlapping other YACs containing D1 1S907 were chosen for map 

30 construction; STSs mapping to these YACs were used for map and clone 

verification. Some YACs annotated in the public database as being chimeric were 
excluded from the analyses. Multiple colonies of each YAC, obtained from a freshly 
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Streaked plate inoculated from the CEPH megaYAC library masterplate, were 
scored using STS markers from the ASTH1 region. These markers included 
polymorphic microsatellite repeats, expressed sequence tags (ESTs) and STSs. 
Comparison of STS mapping data for each clone with the public map allowed 
choice of the individual clone which retained the greatest number of ASTH1 region 
STSs, and was therefore least likely to be deleted. YAC addresses for which 
clones differed in STS content were interpreted to be prone to deletion; those for 
which a subset of clones contained no ASTH1 region STSs were presumed to be 
contaminated with yeast cells containing a YAC from another region of the genome. 
Chimerism of the chosen clones was assessed by metaphase fluorescent in situ 
hybridization (FISH). Their sizes were detemiined by pulsed field gel 
electrophoresis (PFGE), Southern blotting and hybridization with a YAC vector 
probe. The PFGE analyses also showed that no YAC clone chosen contained 
more than one yeast artificial chromosome. 

An STS map based on assuming the least number of deletions in the YAC 
clones was generated. The STS marker order was in agreement with that of the 
Whitehead map. The STS retention pattern of individual YACs, however, was 
slightly different from that of the public data. In general, the chosen clones were 
positive for a greater number ASTH1 region markers, showing that the data set was 
likely to have fewer false negatives than the public map. Non-chimeric YAC clones 
spanning the region of greatest interest were chosen for use as hybridization 
probes for the identification of smaller BAC, PAC. P1 or cosmid clones from the 
region. 

Conversion to a plasmid-based clone map 

The YAC map at ASTH1 provided continuous coverage of a 4 Mb region, the 
central 1 Mb of which was of greatest interest. YAC clones comprising a minimal 
tiling path of this region were chosen, and the size purified artificial chromosomes 
were used as hybridization probes to identify BAC and cosmid clones. Gridded 
filters of a 3x human genomic BAC library and of a human chromosome 11-specific 
cosmid library were hybridized with radiolabeled purified YAC. Clones 
corresponding to the grid coordinates of the positives were streaked to colony 
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purity, and filters gridded with four clones of each BAG or cosmid. These 
secondary filters were hybridized with size-purified YAC DMAs. A proportion of both 
the BACs and cosmids were found to be non-clonal by these analyses. A positively 
hybridizing clone of each was chosen for further analysis. 

The BAG and cosmid clones were STS mapped to establish overlaps 
between the clones. The BAGs were further localized by DIRVISH. BAGs which 
did not contain an STS marker were mapped in painvise fashion by simultaneous 
two-color DIRVISH with another BAG. The map produced had three gaps which 
were subsequently filled by end cloning and hybridization of the end clones to a 
human genomic PAG library. Genetic refinement of the ASTH1 region had 
occunred concunrently with mapping, rendering it unnecessary to extend the BAG- 
contigged region. Mapping data was recorded in AGeDB (Eeckman and Durbin 
(1 995) Methods Cell Biol. 48:583). 

Genomic sequencing and gene prediction 

A minimal tiling path of BAG and cosmid clones was chosen for genomic 
sequencing. Over 1 Mb of genomic sequence was generated at ASTH1, On 
average, sequencing was done to 12x coverage (12 times redundancy in 
sequences). Marker order was verified relative to the STS map. 

BLAST searches (Altschul et al. (1990) supra.) were performed to identify 
sequences in public databases that were related to those in the ASTH1 region. 
Sequence-based gene prediction was done with the GRAIL [Roberts (1991) 
Science 254:805] and Geneparser [Snyder and Stomno (1993) Nucleic Acids Res 
21: 607] programs. Genomic sequence and feature data was stored in AGeBD. 

Development of new microsatellite markers for genetic refinement of the ASTH1 
region 

Additional informative polymorphic markers were important for the genetic 
refinement of the ASTH1 region. 'Saturation' cloning of every microsatellite in the 
1 Mb region surrounding D1 1S907 was performed. Plasmid libraries were 
constructed from PFGE purified DNA from each YAG. prescreened with a primer 
firom each known microsatellite mari<er, then screened with radiolabeled (GA)15 or 
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a pool of trinucleotide and tetranucleotide repeat oligonucleotides. The plasmid 
inserts were sequenced, the set of sequences compared with those of the known 
microsateliite markers in the region, using Power assembler (ABI) or Sequencher 
(Alsbyte). Primer pairs flanking each novel microsateliite repeat were designed, 
and the heterozygosity of each new marker was tested by Batched Analysis of 
Genotypes (BAGs; LeDuc etal,, 1995. PGR Methods and Ap plirations 
Additional microsatellites were found by analysis of the genomic sequence in 
AceDB. Table 1 lists all the microsateliite markers used for genotyping in the 
ASTH1 region and their repeat type, source and primers. Table 1B lists some 
repeat sequences. 

TABLE 1 

Polymorphic microsateliite markers in the ASTHl region 



SEQ ID MARKER 
160. 110O5GT1 



161. 
162. 
163. 
164. 
165. 
166. 
167. 
168. 
169. 
170. 
171. 
172. 
173. 
174. 
175. 
176. 
177. 
178. 
179. 



139C7GT1 



171L24AT1 



253E6GT1 



253E6TE1 



253E6TR1 



65P14 



65P14GT1 



65P14TE1 



65P14TE2 



PRIMER 1 

CTGCTGTGGACGAATAGG 

TCAATATAATCTTGCTTAACTTGG 

GACCTGTTTGGGTTGATTTCAG 

GTTTCTTACAGTGTCTTGCTATCACATCACC 

GAGGACTGGCAGTACCAA6TA;U\C 

GTTTCTTTGGTTCATTCTAAGATGGCTGG 

GCTGAGGCAGGAGAAAAGACAAG 

GTTTCTTCATGCAAAGGTCAGGAGGTAGG 

GTTGCTTCCAGACGAGGTACATG 

GTTTCTTCAATGGCTCCACAAACATCTCTG 

AGGTTTAGGGGACAGGGTTTGG 

GTTTCTTTCCTGGCTAACACGGTGAAATC 

GTTTCTTATTGCCTCCTCCCAAAATTC 

AGAGGCCACTGGAAGACGAA 

AACTGGAGTCAGGCAAAACGTG 

GTTTCTTTGGCTGGTAAGGAAAGAT^CCAC 

GGCTAGGTTCATAAACTCTGTGCTG 

GTTTCTTGATTGTTTGAGATCCTTGACCCAG 

GCCGAAATCACAACACTGCATC 

GTTTCTTGATTCTGCTCTTACTCTTGCCCC 
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180. 65P14TR1 
181. 

182. 774F 
183 . 

5 184. 774 J 
185. 

186. 774L 
187. 

188. 774N 
10 189. 

190. 7740 
191. 

192. 774T 
193. 

15 194. 86J5AT2 
195. 

196. 86J5CA1 
197. 

198. 86J5GT1 
20 199. 

200. 86J5GT2 
201. 

202. 86J5TE1 
203. 

25 204. 8E.PENTA1 
205. 

206. 8EP04D05 
207. 

208. 8016GT1 
30 209. 

210. 8016GT2 
211. 

212. AFM198YB10(G) 



PCT/US98/01260 



GTAATAGAACCAAAGGGCTGAGAC 

GTTTCTTCGGAGTCAGACCTTACATTGTTGAG 

ATCTCCCTGCTACCCACCTT 

GTTTCTTGTTTTCAGTGAGTTTCTGTTGGG 

GTGTGCCAAACAACATTTGC 

GTTTCTTCAAGCCATCAAGCTAGAGTGG 

GGGCTTTTAAACCCTTATTTAACC 

GTTTCTTAGGTGATCTCAGAGCCACTCA 

AGGGCAGGTGGGAACTTACT 

GTTTCTTTGGAGTCAGTTGAGCTTTCTACC 

TGAACTTGCCTACCTCCCAG 

GTTTCTTAGCATATATCCTTACACAAGCACA 

CATGGTTCCAAAGGCAAGTT 

GTTTCTTTTGAG6CTGAATGAGCTGTG 

ACAGGTGGGAAGACTGAATGTC 

GTTTCTTGCAGTACACATCACATGACCTTG 

GAAATAGGCGGAAACTGGTTC 

GTTTCTTCGTTGTGGTTGTTCAGAAAGG 

GGTCAAGTGTTCAGAACGCATC 

GTTTCTTGCAGGGATTATGCTAGGTCTGTAG 

AGCACTTCTGAGGAAGGGACAC 

GTTTCTTAGGGCAGGCAGACATACAAAC 

GCCAATGTGTTCCTAGAGCGAC 

GTTTCTTTTAAAGGGGGTAGGGTGTCACC 

GGAAGGGAAAAGGACAAGGTTTTG 

GTTTCTTAGCAAGAGCACTGGTGTAG6AGTC 

GCTTTTCAAGCACTTGTCTC 

TGGGATTGTGACTTACCATG 

ACTTGGTGTCTTATAGAAAGGTG 

GTTTCTTAGCTGTGTTTGCTGCATC 

AGATGTGTGATGAGATGCAG 

GTTTCTTCAAATAGTGCAACAAACCC 

TGTCATTCTGAAAGTGCTTCC 
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213. GTTTCTTCTGTAACTAACGATCTGTAGTGGTG 

214 . AFM205YG5 (G) TATCAAGGTAATATAGTAGCCACGG 

2 15 . AGGTCTTTCAT6CAGAGTGG 

216 . AFM206XB2 (G) ATTGCCAAAACTTGGAAGC 

5 217. AGGTGACATATCAAGACCCTG 

218 . AFM283WH9 (G) TTGTCAACGAAGCCCAC 

219. GTTTCTTGCAAGATTGTGTGTATGGATG 

220. AFM3 24 YH5 (G) GCTCTCTATGTGTTTGGGTG 

221. AAGAGTACGCTAGTGGATGG 
10 222. AFMA154ZD1 (G) TCCATTAGACCCAGAAAGG 

223. GTTTCTTCACCAGGCTGAGATGTTACT 

224. ASMI 14 AATCGTTCCTTATCAGGTAATTTGG 

225. GTTTCTTCAAAGAAAGCAATTCCATCATAACA 

226. ASMI 14T GCATTTGTTGAAGCAAGCGG 
15 227. CTTTGTTCCTTGGCTGATGG 

228. CA11_11 AATAGTACCAGACACACGTG 

229. CAATGGTTCACAGCCCTTTT 

230. CA3 9_2 AGCCTGGGAGACAGAGTGAG 

231. GTTTCTTGCACTTTTTGGGGAAGGTG 
20 232. CD59{L) GTTCCTCCCTTCCCTCTCC 

233. GTTTCTTTCAGGGACTGGATTGTAG 

234. D11S1301{U) GTGTTCTTTATGTGTAGTTC 

235. GTTTCTTGGCAACAGAGTGAGACTCA 

236 . D11S1751 (G) GTGACATCCAGTGTTGGGAG 

25 237. GTTTCTTCCTAAGCAAGCAAGCAATCA 

238 . D11S1776 (G) AAAGGCAATTGGTGGACA 

239. GTTTCTTTTCAATCCTTGATGCAAAGT 

240 . D11S1900 (U) GGTGACAGAGCAAGATTTCG 

241. GTTTCTTGTAGAGTTGAGGGAGCAGC 

30 242. D11S2008/D11S1392 CATCCATCTCATCCCATCAT 
(C) 

243. GTTTCTTTTCACCCTACTGCCAACTTC 

244 . D11S2014 (C) CCGCCATTTTA6AGAGCATA 
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245. 

246- D11S4200{G) 
247. 

248. D11S907(G) 
249. 

250. D11S935(G) 
251. 

252. GATA-P18492 (C) 
253. 

254. GATA-P6915{C) 
255. 

256. L19CA3 
257. 

258. L19PENTA1 
259. 

260. L19TETRA5 
261. 

262. LMP2 
263. 

264. LMP3 
265. 

266. LMP4 

267. 

268. LMP7 
269. 

270. T18_5 
271. 

272. T29_9 
273. 

274. 774L 
275. 

276. 774N 



PCT/US98/01260 

GTTTCTTTTCTGGGACAATTGGTAGGA 

TTTGTGTTATTATTTCAGGTGC 

GTTTCTTGTTTTTTGTTTCA GTTTAGGAAC 

CATACCCAAATCGTTCTCTTCCTC 

GTTTCTTGGAAAAGCAAAGf GCATCGTAGAG 

TACTAACCAAAAGAGTTGGGG 

CTATCATTCAGAAAATGTTGGC 

GTATGGCAGTAGAGGGCATG 

AAGGTTACATTTCAAGAAATAAAGT 

CTGTTCAGGCCTCAATATATACC 

AAGAGGATAGGTGGGGTTTG 

CCTCCCACCTAGACACAAT 

ATATGATCTTTGCATCCCTG 

AAGAAAGACCTGGAAGGAAT 

AAACAGCAAAACCTCATCTC 

CCACCACTTATTACCTGCAT 

TGAATGAATGAATGAACGAA 

AACTGTGATTGTGCCACTGCACTC 

GTTTCTTCACCGCCTTTATCCCTCAAATG 

GATGGGTGGAGGGCAGTTAAAG 

GTCAAGCAACTTGTCCAAGGCTAC 

CAGGCTATCAGTTTCCTTTGGAG 

GGCAGGTAATACTGGAGAATTAGG 

GACGGATCTCAGAGCCACTC 

GTTTCTTAAAAGATAAGGGCTTTTAAACC 

AGTTTCACAGCTTGTTATGG 

GGTTGATGAAGTGAGACTTT 

ATGGTGGATGCATCCTGTG 

GTTTCTTGTATTGACTCCTCCTCTGC 

CAGTAAACAT 

TGTTGAGTGG 

TCTCCTCAATGTGCATGT 
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277. 
278. 
279. 
280. 



ASMI14 



CAll 11 



ATTCTACATA 
GTGTTTGCAT 
ACAAGTTGGC 
TAGTACCAGA 



281. TACATCCAAGAAAA 

The source of marker was Sequana Therapeutics, Inc. unless a 
letter in parenthesis is indicated after the name, where G = 
Genethon; L = Nothen and Dewald (1995) Clin. Genet. 47:165; U « 
the Utah genome center, see: The Utah Marker Development Group 
(1995) Am. J. Hum. Genet . 57:619; 0= the cooperative Human 
Lineage Center. 



SEQ 

282 . 

283 , 

284 . 

285. 
286. 

287. 



288. 
289. 
290. 

291. 
292. 



Table IB 

Marker Repeat and flanking sequence 
CAS 9_2 GAGACTCTGA ( CA ) nAATATATATA 
7 7 4 F TGTTGATCGC ( CA ) nAACCAAAATC 

774 J AATGCATGTA (TG) 2TATA (TG) nGTGTGGTATG (TG) 3TACATATG 

CG 

7740 CCTCCCAGAA(CA)n ATCATGATAA 

LI 9PENT AGACAGTCTCAAAAAAT ( ATTTT ) nAAAGAAAAAGCTGGATAAAT 
Al 

6 5P14TE AACTAGCTTTAAGAAAATAAGAAGAAAAAGAAAGAAG ( AAAG) 2TAA 
1 G (AAAG) nAGAAAGAAAAG (AAAG) nAAAAG (AAAG) nAGGAATGAT 

TGAC 

65P14 CGCGCACATA (CA) nCCCTTTCTCT 

774L CAGTAAACAT(CA)n TGTTGAGTGG 

774N TCTCCTCAATGTGCATGT (GTGC) 2 ATGA (GTGC) 2 (AC) n 

ATTCTACATA 

ASMI14 GTGTTTGCAT (GT)n T (GT) 3 ACAAGTTGGC 

CA11_11 TAGTACCAGA (CA)2 CG(TG)2 (CA) 2 GGCAAGCG (CA)n C 
{CA)3 TACATCCAAGAAAA 



Genetic refinement of the ASTIH1 region 

The microsatellite markers isolated from YACs from the ASTH1 region were 
genotyped in both the Tristan da Cunha and Toronto cohorts. Genetic refinement 
of the ASTH1 region was accomplished by applying the transmission/disequilibrium 
test (TDT; Spielman et al. (1993) Am. J. Hum. Genet. 52:506) to genetic data from 
the Tristan and Toronto populations, at markers throughout the ASTH1 region. The 
TDT statistic reflects the level of association between a marker allele and disease 
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Status. A multipoint version of the TDT test controls for variability in 
heterozygosities between loci, and results in a smoother regional TDT curve than 
would a plot of single locus TDT data. Significance of a TDT value is determined by 
means of the test; A value of 3.84 or greater is considered statistically 
5 significant at a probability level of 0.05. Figure 1 shows graphs of values for key 
ASTH1 region markers for both history of asthma with positive methacholine 
challenge, for the Toronto triad families, x^ is plotted vs. genomic location of the 
marker on the physical map. 

The Toronto TDT peak is located at marker D1 1S2008 (x^= 1 1 ,6. p < .0001). 

10 The marker allele in disequilibrium is fairly rare (freq = 6%), representing the fourth 
most common allele at this marker. The relative risk of affection vs. normal for this 
allele is 5.25. This is also the peak marker for linkage and linkage disequilibrium in 
Tristan da Cunha, indicating that the ASTH1 gene is very close to this marker. The 
markers defining the limits of linkage disequilibrium were D11S907 and 65P14TE1. 

15 The physical size of the refined region is approximately 100 kb. 

A significant TDT test reflects the tendency of alleles of markers located near 
a disease locus (also said to be in "linkage disequilibrium" with the disease) to 
segregate with the disease locus, while alleles of markers located further from the 
disease locus segregate independently of affection status. An expectation that 

20 derives from this is that a population for which a disease gene (/e a disease 
predisposing polymorphism) was recently introduced would show statistically 
significant TDT over a larger region surrounding the gene than would a population 
in which the mutant gene had been segregating for a greater length of time. In the 
latter case, time would have allowed more opportunity for markers in the vicinity of 

25 the disease gene to recombine with it. This expectation is fulfilled in our 

populations. The Tristan da Cunha population, founded only 10 generations ago, 
shows a broader TDT curve than does the set of Toronto families, which are mixed 
European in derivation and thus represent an older and more diverse, less recently 
established population. 

30 
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Gene isolation and characterization 

The tiling path of BACs. cosmids and PAC clones was subjected to exon 
trapping and cDNA selection to isolate sequences derived from ASTH1 region 
genes. Exon trap clones were isolated on the basis of size and ability to cross- 
hybridize. Approximately 300 putatively non-identical clones were sequenced. 
cDNA selection was performed with adult and fetal lung RNA using pools of tiling 
path clones. The cDNA selection clones were sequenced and the sequences 
assembled with those of the exon trap clones. Representative exon trapping 
clones spanning each assembly were chosen, and arranged as "masterplates" (96- 
well microtitre dishes) of clones. Exon trap masterplate clones and cDNA selection 
clones were subjected to expression studies. 

Human multi-tissue Northern blots were probed with PGR products of 
masterplate clones. In some cases, exon trapping clones did not detect RNA 
species, either because they did not represent expressed sequences, or 
represented genes with very restricted patterns of expression, or due to small size 
of the exon probe. 

Masterplate clones detecting discrete RNA species on Northem blots were 
used to screen lambda phage based cDNA libraries chosen on the basis of the 
expression pattern of the clone. The sequences of the cDNAs were determined by 
end sequencing and sequence walking. cDNAs were also isolated, or extended, by 
5' and 3' rapid amplification of cDNA ends (RACE). In most cases, 5' RACE was 
necessary to obtain the 5' end of the cDNA. 

ASTH1I and ASTH1J were detected by exon trapping. ASTH1I exons 
detected a 2.8 kb mRNA expressed at high levels in trachea and prostate, and at 
lower levels in lung and kidney. ASTH1 1 exons were used as probes to screen 
prostate, lung and testis cDNA libraries; positive clones were obtained from each of 
these libraries. Isolation of a ASTH1 1 cDNA clone from testis demonstrates that this 
gene is expressed in this tissue, and possibly others, at a level not detectable by 
Northern blot analysis. 

ASTH1 J exons detected a 6.0 kb mRNA expressed at high levels in the 
trachea, prostate and pancreas and at lower levels in colon, small intestine, lung 
and stomach. Pancreas and prostate libraries were screened with exon clones 
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from ASTH1J. cDNA clone end sequences were assembled using Sequencher 
(Alsbyte) with the sequences of the exon trapped clones, producing sequence 
contigs used to design sequence walking and RACE primers. The additional 
sequences produced by these methods were assembled with the original 
5 sequences to produce longer contigs of cDNA sequences. It was evident from the 
sequence assemblies that both ASTH1 1 and ASTH1 J are altematively spliced 
and/or have alternative transcription start sites at their 5' ends, since not all clones 
of either gene contained the same 5* sequence. 

ASTH1J has three splice forms consisting of the alt1 fomi, found in prostate 

10 and lung cDNA clones, and in which the exons (illustrated in Figure 1) are found in 
the order: 5* a, b, c, d, e, f. g. h. i 3'. A second form, alt2, in which the exon order is: 
5' a2, b, c. d. e. f. g, h. 1 3* was seen in a pancreas cDNA clone. A third form. alt3. 
contains an alternate exon, a3. between exons a2 and b. The start codon is within 
exon b, so that the open reading frame is identical for the three forms, which differ 

15 only in the 5' UTR. The ASTH1J cDNAs shown as SEQ ID N0:2 {form altl); SEQ 
ID NO:3 (form alt2); SEQ ID N0:4 (form alt3) are 5427. 5510 and 5667 bp in length, 
respectively. The sequence of the entire protein coding region and alternate 5' 
UTRs are provided. The 3' terminus, where the polyA tail is added, varies by 7 bp 
between clones. The provided sequences are the longest of these variants. The 

20 encoded protein product is provided as SEQ ID NO:5. 

ASTH1I was seen in three isoforms denoted as altl, alt2, and altS. The 
exons of ASTH1I and ASTH1J were given letter designations before the 
directionality of the cDNA was known, the order is different for the two genes. In 
the altl fomn of ASTH1 1, exons are in the following order: 5* i, f, e. d, c, b, a 3\ In 

25 the alt2 fornn of ASTH1 1, an alternative 5* exon, j, substitutes for exon i, with the 
following exon arrangement: 5' j, f, e, d, c, b, a 3'. The alt3 fomi of the gene has 
the exon order: 5' f, k, h, g, e, d, c, b, a 3'. The alternative splicing and start 
codons in each of exons i, f and e give the three forms of ASTH1 1 protein different 
amino termini. The common stop codon is located in exon a, which also contains a 

30 long 3' UTR. Two polyadenylation signals are present in the 3' UTR; some cDNA 
clones end with a polyA tract just after the first polyA signal and for others the polyA 
tract is at the end of the sequence shown. Since the sequences shown for the altl , 
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alt2, and alt3 forms of ASTH1I (2428 bp; 2280 bp and 2498 bp; respectively) are 
close to the estimated Northern blot transcript size of 2.8 kb, these sequences are 
essentially full length. 



EST matches 

The nucleotide sequences of the alt1, alt2 and altS fomis of ASTH1J and the 
alt1, alt2 and alt3 fomis of ASTH1I were used in BLAST searches against dbEST in 
order to identify EST sequences representing these genes. Perfect or near perfect 
matches were taken to represent sequence Identity rather than relatedness. 
Accession numbers T65g60, T64537, AA055g24 and AA055327 represent the 
fonvard and reverse sequences of two clones which together span the last 546 bp 
(excluding the polyA tail) of the 3* UTR of ASTH1 L No ESTs spanned any part of 
the coding region of this gene. One colon cDNA clone (accession number 
AA149006) spanned 402 bp including the last 21 bp of the ASTH1J coding region 
and part of the 3' UTR. 

Intron/exon structure determination 

The genomic organization of genes in the ASTH1 region was determined by 
comparison by BLAST of cDNA sequences to the genomic sequence of the region. 
The genomic sequence of the ASHT1 region 5* to and overiapping ASTH1J. is 
provided in SEQ ID NO:1. Genomic structure of the ASTH1I and ASTH1 J genes is 
shown in Figure 1 ; the intron/exon junction sequences are In Table 2. 



TABLE 2: Genomic organization of the ASTHII and ASTHl J genes. 
*Exonic sequences are upper case, flanking sequences lower case. 

SEQ NO Exon Size of Sequences at the ends of and 

exon flanking the exons of ASTHII and 

(bp) ASTHIJ* 

ASTHII 

293. i >214 ggaggc t gagCAGGGGTGCC . . . 

294 . . . . ACTCCCACAGgtacctgcag 

295. j >66 . . .CTGCCCTCACgtaagcgcct 
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296. f 125 gctgttgcagGGTAATGTTG. • . 

2 97. ... CATCAGACAGgtgcgtaca 

298. k 226 ggctggtgagGAGGGGCTGA. • . 

299 . ... CGCTCTGTGGgtgagcttca 

300. h 93 tgt ggaat agCCCAATTACA ... 

301. . . .AGGGTGCTGAgtgagtagta 

302. g 79 ttcttttcagGCCCTCGTGT. . . 

303 . ... TGCTGACCCGgtatggtggt 

304. e 232 1 1 tggtgcagCCTGTGACTC . . . 

305 . ... CGCACACAAGgtcagtgttc 

306. d 51 tctttcccagGTTACTCCTT. . . 

307. . . .ATCAAAGACTgtaagtaacc 

308. c 69 tctatttcagATGCTGATTC. , . 

309. ... AGTAGAACAAgt aag t gcag 

310. b 196 ttttcaaaagGCCTCCAAAG. . . 

311. ... GAGCCCTGAGgtaagt taat 

312 . a 1522 gctttttcagATACTACTAT. . . 

313 . ... TAACATGTTCaactgtctgt 

314. a 146 tgttatatgcATTTATCTTC. . . 

315 . . . .GGTAAATGAGgtaagtcctg 

316. a2 229 tcttgttaagATCGCTCTCT. . . 

317 . ... CCTTGCCCAGgttctcttaa 

318. a3 157 gcaatcgcacCTGCACACCC. . . 

319 . . . . ACTGCCCATTtctggtaaag 

320. b 100 cccctaacagATCATGATTC. . . 

321. ... ACGTGCAATGgt aagagggc 

322. c 246 tgttttgcagTTTCCAGTGG. . . 
323 . . . . AAGTGGAACGgtgactctct 
324. d 63 tccttcacagGCCAGTGCAG. . . 
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325. . . . GAACAAACTGgtg agtagta 

326. e 69 ttttttgtagAGCCTTCCAT. . . 

327 . . . . AGCACAGTAGgtaactaact 

328. f 69 atggccacagATTTGTTGGA. - . 
5 329. ... CTTCCTGTTGgtaagctgtc 

330. g 63 ttctccttagCAGAGTCACC. . . 

331. . . .AAAAAGCACAgtaagttggc 

332. h 196 ttttcatcagACCCGAGAGG. . . 
333 . . . .GAGCTATGAGgtgaggagtt 

10 334. i 4457 tttgttacagATATTACTAC. . . 

335 . . . .AGCCTGGAAAtgcgtgtttc 

The deduced ASTH1I and ASTH1J proteins 

The protein encoded by ASTH1 J (SEQ ID N0:5) is 300 amino acids in 

15 length. A BLASTP search of the protein sequence against the public nonredundant 
sequence database (NCBI) revealed similarity to one protein domain of transcription 
factors of the ets family. The ets family, named for the E26 oncoprotein which 
originally defined this type of transcription factor, is a group of transcription factors 
which activate genes involved in a variety of immunological and other processes, or 

20 implicated in cancer. The family members most similar to ASTH1 1 and ASTH1 J are: 
ETS1. ESX. ETS2. ELF, ELK1, TEL, NET, SAP-1. NERF and FLI. Secondary 
structure analysis and comparison of the protein sequence to the crystal structure of 
the human ETS1-DNA complex (Wemer etaL (1995) CgH 83:761) confirmed that it 
has a winged helix turn helix motif characteristic of some DNA binding proteins 

25 which are transcription factors. 

Multiple sequence alignment of ASTH1I, ASTH1J, and other ETS-domain 
proteins detected a second, N-terminal domain shared by ASTH1 1. ASTH1J and 
some, but not all, ETS-domain proteins. Conservation of this motif have been 
observed (Tei et ai (1992) Proc. Natl. Acad. Sci. USA 89: 6856-6860), and its 

30 involvement in protein self-association have been documented for TEL, an ETS- 
domain protein, upon its fusion with platelet-derived growth factor p receptor (Carrol 
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etal. n996) Proc. Natl. Acad. Sci. USA 93:14845-1485QV Alignment of the 
temiinal conserved domain in the ETS proteins was converted into a generalized 
sequence profile to scan the protein databases using the Smith-Waterman 
algorithm. This search revealed that the N-terminal domain in ASTH1I, ASTH1 J and 
other ETS-domain proteins belongs to the SAM-domain family (Schultz et si (1997) 
Protein Science 6:249-253). SAM domains are found in diverse developmental 
proteins where they are thought to mediate protein-protein interactions. Thus, both 
ASTHIi and ASTH1 J are predicted to contain two conserved modules, the N- 
temiinal protein interaction domain (SAM-domain) and the C-terminal DNA-bindIng 
domain (ETS-domain). The sequence segments between these two domains is 
predicted to have elongated, non-globular structure and may be hinges between the 
two functional domains in ASTH1I and ASTH1J. 

The ASTHII alt1 (SEQ ID NO:7), alt2 (SEQ ID NO:9) and alt3 (SEQ ID 
NO:11) fomns are 265. 255 and 164 amino acids in length, respectively, and differ at 
their 5' ends. The ASTH1 1 and ASTH1J proteins show similarity to each other in 
the ets domain and between ASTH1J exon c and ASTHII exon e. They are more 
related to each other than to other proteins. Over the ets domain they are 66% 
similar (/e. have amino acids with similar properties in the same positions) and 46% 
identical to each other. All three forms of ASTH1 1 have the helix turn helix motif 
located near the carboxy terminal end of the protein. 

The alternate forms of the ASTHII protein may differ in function in critical 
ways. The activity of ets transcription factors can be affected by the presence of 
independently folding protein structural motifs which interact with the ets protein 
binding domain (helix loop helix). The differing 6' ends of the ASTH1 1 proteins may 
help modulate activity of the proteins in a tissue-specific manner. 

Polymorphism analysis of ASTHII andASTHU 

Affected and unaffected individuals from the Toronto cohort were used to 
detemiine sequence variants, as were approximately 25 controls derived from 
populations not selected for asthma. Affected and unaffected individuals from the 
Tristan da Cunha population were also chosen; the set to be assayed was also 
selected to represent all the major haplotypes for the ASTH1 region in that 
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population. This ensured that all chromosome types for Tristan were included in the 
analysis. 

Polymorphism analysis was accomplished by three techniques: comparative 
(heterozygote detection) sequencing, radioactive SSCP and fluorescent SSCP. 
5 Polymorphisms found by SSCP were sequenced to determine the exact sequence 
change involved. 

PCR and sequencing primers were designed from genomic sequence 
flanking each exon of the coding region and 5' UTRs of ASTH1I and ASTH1J. For 
fluorescent SSCP, the fonA/ard and reverse PCR primers were labeled with different 
10 dyes to allow visualization of both strands of the PCR product. In general, a variant 
seen in one strand of the product was also apparent in the other strand. For 
comparative sequencing, heterozygotes were also detected in sequences from both 
DNA strands. 

Polymorphisms associated with the ASTH1I locus are listed in Table 3. The 

15 sequence flanking each variant is shown. Polymorphisms were also deduced from 
comparison of sequences from multiple independent cDNA clones spanning the 
same region of the transcripts, and comparison with genomic DNA sequence. The 
polymorphisms in the long 3' UTR regions of these genes were found by this 
method. One polymorphism in each gene is associated with an amino acid change 

20 in the protein sequence. An alanine/valine difference in exon c of ASTH1J is a 
conservative amino acid change. A serine/cysteine variant in exon g of ASTH1I is 
not a conservative change, but would be found only in the alt3 form of the protein. 

The polymorphisms in the ASTH1I and J transcribed regions were genotyped 
in the whole Tristan da Cunha and Toronto populations, as well as in a lai^er 

25 sample of non-asthma selected controls, by high throughput methods such as OLA 
(oligonucleotide ligation assay; Tobe etaL (1996) Nucl. Acids Res. 24:3728) or 
Taqman (Holland et al. (1992) Clin. Chem. 38: 462), or by PCR and restriction 
enzyme digestion. The population-wide data were used in a statistical analysis for 
significant differences in the frequencies of ASTH1I or ASTH1J alleles between 

30 asthmatics and non-asthmatics. 
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TABLE 3: POLYMORPHISMS IN THE ASTHII AND ASTHIJ GENES. 



Polymorphism Location Sequence 

SEQ ASTHII Transcribed region 

16, EXON B ( + )170 ACAGAATGACETATGAAAAGT 

5 17. INTRON D ( + )15 GTAACCAAGCKCAAGCCACCC 

18. INTRON F ( + )24 AAGGAGCCCAXCTGAGTGCAG 

19. EXON G ( + )62 ser-*-cys CGTTCCATCTSTGCTCTGTGC 
2 0 . EXON H ( + ) 7 7 AGCGCCTCGGYTGGCTGAGGG 
21. EXON A 3' UTR ( + )1176 TGTATTCAAGXGCTATAACAC 

10 22, EXON I ( + )76 CACTGAGAAGCC£Z:iACAGGCCTGT 

23. EXON I ( + )86 CCCACAGGCCWGTCCCTCCAA 

24. INTRON J ( + )93 CGTCCATCTCYAGCTCCAGGG 
ASTHIJ TrsLnscribed region 

25. EXON A 5' UTR ( + )38 GACTTGATAAXGCCCGTGGTG 
15 26. EXON A 5' UTR ( + )39 ACTTGATAACfiCCCGTGGTGC 

27. EXON A 5' UTR ( + )99 CTCCCCTCCAHGAGCCACAGC 

28. INTRON A ( + ) 224/225 ATTTCCTGCATlZnGTCTGGACTT 

29. INTRON A ( + )48 ATCCAAACACXTGAGTGGAT^ 

30. EXON A3 ( + )28 AGTTTCCTCARTGCGGGAGCT 
20 31. EXON C ( + )158 GCGAGCACCTXTGCAGCATGA 

32, EXON C { + )190 ala-^val TTCACCCGGGXGGCAGGGACG 

33. INTRON D (-) 36/37 CTGGGGAAAAlGSl/lGATCGCTGAC 

34, INTRON F (-)22 GTCAATTAAAXGGCTCTCATT 

35. INTRON G (-)27 TAGATCATTCETAACCTGCCT 
25 36. EXON I (3' UTR) ( + )22 AAAGAGAAATWCTGGAGCGTG 

37. EXON I (3' UTR) ( + )220 ATGAGGGGAAMAAGAAACTAC 

38. EXON I (3' UTR) ( + )475 TTTTGTATGTl^ACATGATTTA 

39. EXON I (3' UTR) { + )871 AGCTTGGTTCXTTTTTGCTCC 

40. EXON I (3' UTR) { + ) 1084 TTGACACCAGRAACCCCCCAG 

30 5» to ASTHIJ 

41. CAAT box -165 AAATGAGCCAETGTTTGTAAT 
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42, 


5PW1J 


_P01+399 


ATCCATTTTGXATTCCTCATT 




43 . 


5PW1J 


_P01+1604 


CTGGAGCTCARACCAGACAGC 




44. 


5PW1J 


_P02+1382 


GCCAGTGCAGSCATCATTACC 




45. 


5PW1J 


_P03+128 


AGTTCAAATCETAATTTTTAT 


5 


46. 


5PW1J 


_P03+556 


TCATCAGAATXTAAATCTCCC 




47. 


. 5PW1 J 


_P03+712 


GGAGATTCAGA/ - TGAAGCAAGA 




48. 


5PW1J 


_P03+781 


TTTTTCCACAXCCAGCCTGGC 




49. 


5PW1J 


_^P03+791 


CCCAGCCTGGXGAACCCTGGC 




50. 


5PW1J 


_P03+820 


CTCTTCATCAXGGTCAAATAC 


10 


51- 


5PW1J 


^P03+1530 


CAACTTGCTGXCAAAGTGCTG 




52. 


5PW1J 


_P03+1605 


TACTATGTGCXAGATACTAAG 




53 . 


5PWlJ_P04+542/543 


ATGCCACTTTRRACAACTTGAG 




54 . 






CGCATGCCTGKAAAGAAGAGA 




55. 


-J XT » Y J_ U 


XT \J ^ 1 A, \J / J 


GGATAAGCACMAGTGAGCCTG 


15 


56 - 


c jn 




AAAGCCAGACRGCAACTTGTG 




57. 


C WM 


P04+1430 


TCTCAAAAAGRGTGATAGGAG 




58 . 


5PW1J 


P05+334 


TCTGAATCCTSTCTCCTCCTT 




59 . 


5PW1J 


P05+749 


TAGAAC CAGGWTGTGGG Arr A 




60 . 


5PW1J 


P05+915 


TTCTTGTGTCRGGCGrAAAAr 


20 


61 . 


5PW1J 


P06+529 


AACCAAdATGRAGAAAGGrriA 




62 . 


5PW1J 


P06+1290 


AAT AAACTATRGTT CACCTAG 




63 . 


5PW1J 


P06+1573 


ACATATTTGTRTCTCATATGA 

■'Tk'V.oX^ X X X X X^^ X X ^JT^X /^X V7X^ 




64 . 


^ C WW •X.W 


P06+1661 


CAAAGCAGTTYrTAATAATPr* 




65. 


5PW1J 


P07+335 


AGATCCTAACYGGGGCCTCCT 


25 


66 . 


5PW1J 


P07+731 


CTCTTTCTCTYTGCTTCICITCfr! 

X • XXX Vo X V«* X X X Va7 Va> X X VpvX.. X v_»\.» 




67. 


5PW1J 


P07+1024 


TTAGGAATCCWCAAATATGTA 




68 . 


5PW1J_ 


_P07+1610 


GTCTGACTCCRCCTCCCTCAT 




69. 


5PW1J_ 


P08+398 


GAATCACATCRTGAGAAATGT 




70. 


5PW1J_ 


P08+439 


AATTCAATCCTTCACAGACTT 


30 


71. 


5PW1J^ 


P08+580 


GTGTAGCCAGRGTTGCTAATT 




72. 


5PW1J^ 


P08+762 


CCTAGAAATASCCAAGGGCAC 




73. 


5PW1J_ 


P08+952 


AAATTCTCATRCCTCACCCTC 




74. 


SPWIJ^ 


P08+1172 


TCCCACCCCTETCACCTTCAT 




75. 


5PW1J_ 


P08+1393 


CCTCATTCTCSGAAGCCAACA 


35 


76 . 


5PW1J_ 


P08+1433 


GAAGAGCCGTYCAGTCCCTTT 




77. 


5PW1J__ 


P08+1670 


TCCATAGGCTYTTTATTTGGC 




78. 


5PW1J_ 


P08+1730 


TCGTTTAGTAIACAGGCTTTG 




79. 


5PW1J_ 


P09+59 


GCCTCAGTTGYCCCAGCTATA 




80. 


SPWIJ^ 


P09+145 


AGCAAAATGCWCTATGCACTG 


40 


81. 


5PW1J_ 


P09+892 


GTGTCCTGACfTTGCACTCCAO / 










ACACTGCCTG 




82. 


5PW1J_ 


PlO+1070 


ATCAGATAACECCTACACTTA 




83. 


5PW1J_ 


PlO+1511 


TCTCTCTTCTSCCTGCCCTGT 




84. 


5PW1J_ 


P09+1132 


TGGACACAGGKAGGGGAATAT 
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85. 5 PW1J_P0 9+1688 TOTCACTTGCRCATACAAGGC 

86. 5 PW1J_P0 9+1900 ATCATCAGATXAGCCCAGAAT 

87. 5PW1J WlRl-1060 TCAACAGAGAEAGTTAATGGT 

88. 5PW1J WlRl-1831 AGCAATAATGXTTCCCTTTTC 

89. 5PW1J WlRl-2355 TCTAGCTTTTXTGTGTTTTTT 

90. 5PW1J WlRl-3160 GATTCCTTAAXGCTTGATACT 

91. 5PW1J WlRl-3787 CCTCCTCCAGXACCAAAGTGG 

92. Wl J_CD+ 2 4 ATGGCCACAGRTCAAATCCTG 

93. WlJ_CA+564 ACTGAGTGTTXATGCCAATTT 
5' to ASTHII 

94 . WI_CL+94 GACAAGCCCTRTCTGACACAC 

95. WI_CN+134 TGAAAAGCCTXCTTGCTGCCT 
9 6 . WI_CQ - 2 8 TCCTGGAGTTXCTTTGCTCCC 

9 7 . WI_CQ+ 3 9 GATTCCAAATHAACTAAAGAT 

98 . P14-16+191662 GACCTCAAGTCRTCCACCCGCC 

99. P14-16+192592 AACAAATACTMCCCCGCAACCC 

100. P14-16+192762 ATTTTTTTTTI/ - AAGGAAAATA 

101. P14 -16+195066 AAATTTCCCCMAAACAAGCAG 

102. P14-16+196590 GAGAAAGGGTRTGTGTGTGTG 

103 . P14-16+196617 GTGTGTGTGTGT^/SIGIATGTGCGCGTG 

104. P14-16+196902 ATCGGGAACCXCATACCCCAA 

105 . P14-16+198040 TTTGTTTCGCMATGAGGTACG 

106 . P14-16+198240 TGAGGGTGTTfiTGGGCTGGAC 

107. P14-16+198840 TCTTCATTGGXATCTGAATGT 

108 . P14-16+200120 GCGAGCACCTXTGCAGCATGA 

109. P14-16+200617 AACCCCCCCCMCACACACACA 

110. J5-16+4454 TCAGTGCTCTfiTAATCAGTCA 

111. . J5-16+4825 TCTTTGTGAAA-/(fiMAATTAGTCTG* 

112. J5-16+5426 GCTGCCCTGASAGCTGGGCCA 

113. J5 -16+5623 CCTTCTGATCXTTGTTTGCTG 

114. J5 -16 + 7386 GGAACACTGAKTCTTGATTAG 

115. J5-16+7904 TAGGCTTCTCXTGATAATTGA 

116. J5-16+8055 TCTTAAAATAMTTGGCTTGTA 

117. J5 -16+10595 TAGATCATTARTAACCTGCCT 

118. J5-16+11140 ATGAGGGGAAMAAGAAACTAC 

119. J5-16+12004 TTGACACCAGRAACCCCCCAG 

120. J5-16+12219 TGTTTTAAATRTTAGGGACAA 

121. J5-16+12303 GTAAGCATAGYAATGTAGCAG 

122 . J5-16+13504 GGCTCTTTCTKCAACCTTTCC 

123. J5-16+14120 GACCCAGGTTRTGAGTTTTCC 

124. ASTHII, exon B +169 GACAGAATGAXATATGAAAAG 

125. A5TH1I, exon I +69 TGTGTGACACXGAGAAGCCCA 
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126 . 


ASTHIJ, exon C +56 


AGTACTGGACMAAGTACCAGG 




TOT 


5' ASTHIJ 


, WI_Cg -9 


IGGGAGCASGTATTGCATT 






ASTHIJ 


Intron A 






TOO 

12o . 


WIJ_Ia01 


+39 


AGATTTGAGGXCTCAGGTC CC 


c 
0 


TOO 

izy . 


WIJ_Ia01 


+14 0 


1 G i LAAl G 1 CisCA i GA xAAGC 




130 . 


WIJ_Ia01 


+678 


1 rGCCCCAGTJ^TTCTCCGGGC 




131 . 


WlJ^IaOl 


+855 


i ATGAGCAGCSTAGGGAGTGG 




132 . 


WIJ_Ia01 


+929 


TV /~<f 1 1f 1 1/^ TV / 111 TV / TV TV TV TV \ / rmV TV TV m TV TV TV 

AGTTGACTGA ( AAAA ) / - TAAATAAGAC 




133 . 


wxu xa 


\JO 


+362 


TV rnro/^t TV TV TV m TV /^rri/^rmv tv tv tv ^ 

ATTCAAATAGSCTCTAGAAAC 


lU 


134 . 


wiu la 


U J 


+ 918 


/-^ /~1 /~1 TV /~1 TV TV 1 1 II 1 II 1 IKjr TV m TV /~1 TV fl'llll^t 

C CCAG AATTTJ5ATATCCATTC 




135 . 


TaTT t t 

wi j_^ia 


03 


T" J7 *± J 


TGACCCAACAEAAACTCACTG 




136 . 


Wlj_ia 


03 


J- -J D -7 


CCAGAATATAMCATCAGCCCT 




137 . 


WlJ^la 


03 




CATCAGCCCTHCTGAGGAGAT 




138 . 


WIJ_Ia 


02 


-1.4. "^R 


CCAGAACAGAXTTTATTCTGT 


15 


139 . 


WlJ^Ia 


02 


+ 3 O J 


TTCAGCCATCXTTCCAGTTGT 




140 . 


WIJ_Ia 


02 


+ D ft O 


TCACTAACTCJiAAAACGACAT 




141 . 


WlJ_Ia 


02 


+ 0 ^ O 


AACTCAAAAAXGACATCCTCC 




142 . 


wiu^^ia 


no 


■ 1 OAA 
+ XUffc O 


GAACTGCACAfiGTTGCACACT 




143 . 


MT T T ^ 

wid xa 


n o 
UZ 


+ XUO X 


TTGTTCCATGSACTACCTCCT 


20 


144 . 


TaTT T To 

wiu la 


n o 
uz 




ACAGCAGGCAXTCAACAAATT 




145 . 


TaTT T T -s 

wi J la 




+ ft X U 


TTATTTTTGGSTTTGTTTTAA 




146 . 


WIJ_Ia 


04 


'T X \/ 3 O 


TAGGCTGTTCXCTGCCATCAC 




147 . 


WIJ_Ia 


05 




GTGCTCTGGGMCACACAGCTC 




148 . 


WIJ_Ia 


05 


+ 1103 


AGACCCGATARGAGCTCCTTC 


oc 
25 


149 . 


WIJ_Ia 


05 


+ 1823 


CATCTTGCGCSGTCATGTAAG 




150 . 


WIJ_Ia 


05 


+ 1852 


/*tTV ^1 TV /*^TV / <l 1 1 1 1 fTirTI/*!./*!/ 11 1 1/^ TV TV TV 

CAGCACAGCTSTTCC CTCAAA 




1 IT -1 

151 . 


WIJ_Ia 


05 


+1906 


rnmrri/^/*^ TV TV TV /"< TV \T rn/*i tv tv /^rmv m 

TTTGGAAACAjlGGTGAAGTAT 




152 . 


WIJ_Ia 


05 


+19113 


TV TV /^/*i/^m/^ TV TV T^tnTV mm^m^m/^^ 

ACACGGTGAAfiTATTGTCTCC 




153 . 


WIJ_Ia 


06 


+794 


TV TV TV TV r^TT\r^/^ TV rnikil/^'Tl/~irTl^ /^TV TV TV 

AAAAGTGGATUCTCTGCAAAC 


30 


154 . 


WIJ_Ia 


06 


+814 


CTTCAAATGCRGCTATTAAAG 




155. 


WIJ_Ia 


06 


+1197 


CCTGGGAGCAXGGTAAATCAG 




156. 


WIJ_Ia 


06 


+ 1231 


TGAAAATGTCSCTTTCTCACCT 




157. 


WIJ_Ia 


06 


+ 1256 


CCTGATATTTECCAACAAGAA 




158. 


WIJ_Ia 


06 


+1535 


AAAGGGTTAGXTTGTCCCCTT 


35 


159. 


WI Caa 


+163 


TGAAAATAAAASACAATTTTTT 



The sequences are listed with the variant residues represented by the appropriate single letter 
designation, i.e. A or G is shown by "R". The variant residues are underlined. Where the 
polymorphism is a deletion, the underlined residues are underlined, and the alternative torn shown 
as a"-". 

40 "Where intron 'a' is the intron 3' to exon 'a', etc. 

^'Position numbers coaespond to the position within the intron or exon, with nucleotide +1 being the 
5 -most base of the exon or the intron. Alternatively, negative numbers denote the number of bases 
from the 3' end of an intron. 

''Position in cDNA = position # for the exon a form of ASTH1J or the exon i form of ASTH1 1. 
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^'Exonic sequences are uppercase, intronic sequences lower case. 
UTR = untranslated region. N/A - not applicable. 

Cross-species sequence conservation 
5 Cross-species sequence conservation can reveal the presence of 

functionally important areas of sequence within a larger region. Approximately 90 
kb of sequence lie between ASTH1I and ASTH1 J, which are transcribed in opposite 
directions (Figure 1). The transcriptional orientation of these genes may allow 
coordinate regulation of their expression. The expression patterns of these genes 

10 are similar but not identical. Sequences found 5' to genes are critical for 

expression. To search for regulatory or other important regions, the genomic 
sequence between ASTH1I and ASTH1J, was examined and plasmid clones 
derived from genomic sequencing experiments chosen for cross-species 
hybridization experiments. The criterion for probe choice was a lack of repeat 

15 elements such as Alu or LINEs. inserts from these clones were used as probes on 
Southern blots of EcoRI-digested human, mouse and pig or cow genomic DNA. 
Probes that produced discrete bands in more than one species were considered 
conserved. 

Consen/ed probes clustered in four locations. One region was located 5' to 
20 ASTH1 1 and spanned exon j of this gene. A second conserved region was located 
5' to ASTH1 IJ, spanning approximately 10 kb and beginning 6 kb 5' to ASTH1J 
exon a (and is within SEQ ID NO:1). Two other clusters of conserved probes were 
noted in the region between ASTH1I and J. They are approximately 10 and 6 kb in 
length. 

25 Promoters, enhancers and other important control regions are generally 

found near the 5' ends of genes or within introns. Methods of identifying and 
characterizing such regions include: luciferase assays, chloramphenicol acetyl 
transferase (CAT) assays, gel shift assays, DNAsel protection assays (footprinting), 
methylation interference assays, DNAsel hypersensitivity assays to detect 

30 functionally relevant chromatin-ree regions, other types of chemical protection 
assays, transgenic mice with putative promoter regions linked to a reporter gene 
such as p-galactosidase, etc. Such studies define the promoters and other critical 
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control regions of ASTH1I and ASTH1J and establish the functional significance of 
the evolutionarily conserved sequences between these genes. 



Discussion 

5 The ASTH1 locus is associated with asthma and bronchial hyperreactivity. 

ASTH1I and ASTH1J are transcription factors expressed in trachea, lung and 
several other tissues. The main site of their effect upon asthma may therefore be in 
trachea and lung tissues. Since ets family genes are transcription factors, a 
function for ASTH1I and ASTH1 J is activation of transcription of particular sets of 

10 genes within cells of the trachea and lung. Cytokines are extracellular signalling 
proteins important in Inflammation, a common feature of asthma. Several efs family 
transcription factors activate expression of cytokines or cytokine receptors in 
response to their own activation by upstream signals. ELF, for example, activates 
IL-2, IL-3, IL-2 receptor a and GM-CSF, factors Involved in signaling between cell 

15 types important in asthma. NET activates transcription of the IL-1 receptor 

antagonist gene. ETS1 activates the T cell receptor a gene, which has been linked 
to atopic asthma in some femilies (Moffatt et al, (1994) supra.) 

Activation of genes involved in Inflammation by other members of the ets 
family suggest that the effect of these ASTH1 genes on development of asthma is 

20 exerted through influencing cytokine or receptor expression in trachea and/or lung. 
Cytokines are produced by structural cells within the ainvay, including epithelial 
cells, endothelial cells and fibroblasts, bringing about recruitment of inflammatory 
cells into the ainA/ay. 

A model for the role of ASTH1I and ASTH1J In asthma that is consistent with 

25 the phenotype linked to ASTH1 , the expression pattern of these genes, the nature 
of the ASTH1 l/J genes, and the known function of similar genes is that aberrant 
function of ASTH1I and/or ASTH1J in trachea or lung leads to altered expression of 
factors involved in the inflammatory process, leading to chronic inflammation and 
asthma. 
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Functional analysis of a ASTH1J promoter sequence variant and location of the 
ASTH1J promoter 

Primer extension analyses performed using total RNA isolated from both 
bronchial and prostate epithelial cells have revealed one major and five minor 
5 transcription start sites for ASTH1J. The major site accounts for more than 90% of 
ASTH1 J gene transcriptional initiation. None of these sites are found when the 
primer extension analysis is performed using mRNA isolated firom human lung 
fibroblasts that do not express ASTH1J. 

Identification of the ASTH1J transcriptional start site has allowed the 
10 localization of a putative TATA box (TTTAAAA) between positions -24 and -30 (24 
to 30 bp 5* to the transcription start site). Although the sequence is not that of a 
typical TATA box, it conforms to the consensus sequence (TATAAAA) for TATA box 
protein binding as compared with 389 TATA elements (Transfac database: 
http://transfac.gbf-braunschweig.de/, ID: V$TATA_01). 

15 

Analy?!^ of the QAAT l?ox "Q" pQ|ymorphi?m by gel shift assay 

Binding of nuclear proteins to a polymorphism in the GCCAAT motif 
(GCCAAT or GCCAGT) found at position -140 (140 bp 5' to the transcription start of 
ASTH1J as defined by primer extension experiments, previously referred to as 

20 "-165 bp"), has been assessed using electrophoretic mobility shift assays. These 
experiments clearly showed a remarkable difference when binding of nuclear 
proteins to radioactively-labelled double stranded oligonucleotides containing the 
normal "A" vs the mutant **G" nucleotide was examined. A specific set of nuclear 
proteins was able to bind to the normal oligonucleotide, but did not bind to the "G" 

25 oligonucleotide. The specificity of the DNA binding complexes was further 

addressed by competition with either nomial or mutant unlabeled oligonucleotides. 
Addition of increasing amounts of normal unlabeled oligonucleotide effectively 
competed binding of nuclear proteins to the labeled normal oligonucleotide, while 
the addition of increasing amounts of unlabelled "G" oligonucleotide did not. 

30 The GCCAAT cis-element is found in many promoters at various locations 

relative to genes, as well as in distal enhancer elements. There is no known 
correlation between location of these elements and activity. Both positive and 
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negative regulatory trans-acting factors are known to bind this class of cis element. 
These factors can be grouped into the NF-1 and C/EBP families. 

The nuclear factor-1 (NF-1) family of transcription factors comprises a large 
group of eukaryotic DNA binding proteins. Diversity within this gene family is 
5 contributed by multiple genes (including: NF-1 A, NF-1B, NF-1C and NF-1X), 
differential splicing and heterodimerization. 

Transcription factor C/EBP (CCAAT-enhancer binding protein) is a heat 
stable, sequence-specific DNA binding protein first purified from rat liver nuclei. 
C/EBP binds DNA through a bipartite structural motif and appears to function 

10 exclusively in terminally differentiated, growth arrested cells. C/EBPa was originally 
described as NF-IL-6; it is induced by IL-6 in liver, where it is the major C/EBP 
binding component. Three more recently described members of this gene family, 
designated CRP 1 , C/EBP p and C/EBP 8, exhibit similar DNA binding specificities 
and affinities to C/EBP a. Furthemnore, C/EBP p and C/EBP 5 readily form 

15 heterodimers with each other as well as with C/EBP a. 

Members of the C/EBP family of transcription factors, but not members of the 
NF-1 family, bind to the ASTH1J promoter region, as detemiined by the use of 
commercially available antibodies (Santa Cruz Biotechnologies, Santa Cruz, CA) 
that recognize all NF-1 and C/EBP family members known to date, in 

20 electrophoretic mobility shift assays. 

Fabricating a DNA anray of polymorphic sequences 

DAM array: is made by spotting DNA fragments onto glass microscope slides 

which are pretreated with poly-L-lysine. Spotting onto the array is accomplished by 
25 a robotic arrayer. The DNA is cross-linked to the glass by ultraviolet irradiation, and 

the free poly-L-lysine groups are blocked by treatment with 0.05% succinic 

anhydride, 50% 1-methyl-2-pyrrolidinone and 50% borate buffer. 

The spots on the array are oligonucleotides synthesized on an ABI 

automated synthesizer. Each spot is one of the alternative polymorphic sequences 
30 indicated in Tables 3 to 8. For each pair of polymorphisms, both fomis are 

included. Subsets include (1) the ASTH1J polymorphisms of Table 3, (2) the 
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ASTH1I polymorphisms of Table 3; and (3) the polymorphisms of Table 4. Some 
internal standards and negative control spots including non-polymorphic coding 
region sequences and bacterial controls are included. 

Genomic DNA from patient samples is isolated, amplified and subsequently 
5 labeled with fluorescent nucleotides as follows: isolated DNA Is added to a standard 
PGR reaction containing primers (100 pmoles each), 250uM nucleotides, and 
5 Units of Taq polymerase (Perkin Elmer). In addition, fluorescent nucleotides 
(CyS-dUTP (green fluorescence) or Cy5-dUTP (red fluorescence), sold by 
Amersham) are added to a final concentration of 60 uM. The reaction is carried out 

10 In a Perkin Elmer thermocycler (PE9600) for 30 cycles using the following cycle 
profile: 92X for 30 seconds, 58*^0 for 30 seconds, and 72**C for 2 minutes. 
Unincorporated fluorescent nucleotides are removed by size exclusion 
chromatography (Microcon-30 concentration devices, sold by Amicon). 

Buffer replacement, removal of small nucleotides and primers and sample 

15 concentration is accomplished by ultrafiltration over an Amicon microconcentrator- 
30 (mwco = 30,000 Da) with three changes of 0.45 ml TE. The sample is reduced 
to 5 |jl and supplemented with 1.4 pi 20X SSC and 5 pg yeast tRNA. Particles are 
removed from this mixture by filtration through a pre-wetted 0.45p microspin filter 
(Ultrafree-MC, Miliipore, Bedford, Ma.). SDS is added to a 0.28% final 

20 concentration. The fluorescently-labeled cDNA mixture is then heated to 98°C for 2 
min., quickly cooled and applied to the DNA array on a microscope slide. 
Hybridization proceeds under a coverslip, and the slide assembly is kept in a 
humidified chamber at 65*^0 for 15 hours. 

The slide is washed briefly in IX SSC and 0.03% SDS, followed by a wash in 

25 0.06% SSC. The slide is kept in a humidified chamber until fluorescence scanning 
was done. 

Fluorescence scanning and data acquisition. Fluorescence scanning is set 
for 20 microns/pixel and two readings are taken per pixel. Data for channel 1 is set 
to collect fluorescence from Cy3 with excitation at 520 nm and emission at 550- 
30 600 nm. Channel 2 collects signals excited at 647 nm and emitted at 660-705 nm, 
appropriate for Cy5. No neutral density filters are applied to the signal from either 
channel, and the photomultiplier tube gain is set to 5. Fine adjustments are then 
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made to the photomultiplier gain so that signals collected from the two spots are 
equivalent. 



Construction of an asthU Transgenic Mouse 
5 Isolation of mouse asthl-J genomic fragment: 

Phage MW1-J was isolated by screening a mouse 12gSv genomic phage 
library (Stratagene) with the 443bp BamHI-Smal fragment from the 5' region of the 
human asth1-J cDNA clone PA1001A as probe. The 23kb insert in MW1-J was 
sequenced. 

10 

Assembly of asth1-Jexb targeting construct: 

A 2.65kb Sad fragment (bp7115-bp9765) from MW1-J was isolated, cloned 
into the Sad site of pUC19. isolated from the resultant plasmid as an EcoRI-Xbal 
fragment, inserted into the EcoRI-Xbal sites of pBluescriptll KS+ (Stratagene), and 

15 the 2.5kb Xhol-MIul fragment isolated. A 5.4kb Hindlll fragment (bp11515-bp16909) 
was isolated from MW1-J, inserted into the Hindlll site of pBluescriptll KS+, 
reisolated as a Xhol-NotI fragment, inserted into the Xhol-NotI sites of pPNT, and 
the 9.5kb Xhol-Mlui fragment isolated. The two Xhol-MIul fragments were ligated 
together to produce the final targeting construct plasmid, asthlexb. Asthlexb was 

20 linearized by digestion with NotI and purified by CsCI banding. 

Identification of targeted ES clones: 

Approximately 10 million RW4 ES cells (Genome Systems) were 
electroporated with 20 mo of linearized asthlexb and grown on mitomycin C 

25 inactivated MEFs (Mouse Embryo Fibroblasts) in ES cell medium (DMEM + 15% 
fetal bovine serum+IOOOU/ml LIF (Life Technologies)) and 400 pg/ml G418. After 
24-48hrs, the cells were refed with ES cell medium. After 7-10 days in selection 
culture approximately 200 colonies were picked, trypsinized, grown in 96 well 
microtiter plates, and expanded in duplicate 24 well microtiter plates. Cells from 

30 one set of plates were trypsinized, resuspended in freezing medium (Joyner, A., 
ed., Gene Targeting, A Practical Approach. 1993. Oxford University Press), and 
stored at -85C. Genomic DNA was isolated from the other set of plates by standard 
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methods (Joyner, supra,) Approximately 10 |jg of genomic DNA per clone were 
digested with Ndel and screened by southern blotting using a 100 bp fragment 
(bp6164-bp6260) as probe. A banding pattern consistent with targeted 
replacement by homologous recombination at the asth1-J locus was detected in 10 
of 1 1 3 clones screened. 

Production of asth l-J knockout mice: 

Two of the targeted clones, cl#117 and cl#58, were expanded and injected 
into 0576176 blastocysts according to standard methods (Joyner, supra). High 
percentage male chimeric founder mice (as ascertained by extent of agouti coat 
color contribution) were bred to A/J and 0576176 female mice. Germline 
transmission was ascertained by chinchilla or albino coat color offspring from A/J 
outcrosses and by agouti coat color offsprint from C57BU6 outcrosses. The Ndel 
southern blot assay employed for ES cell screening was used to identify germline 
offspring carrying the targeted allele of Asthl-J. Germline offspring from both A/J 
and C57BL/6 outcrosses were identified and bred with A/J or 057BL/6 mates 
respectively. 

Mice heterozygous for the Asthl-J targeted allele are interbred to obtain 
mice homozygous for the asth 1 -J targeted allele. Homozygotes are identified by 
Ndel Southern blot screening described above. The germline offspring of the 
chimeric founders are 50% A/J or 0576L6 and 50% 129SvJ in genetic background. 
Subsequent generations of backcrossing with wild type A/J or 057BL/6 mates will 
result in halving of the 129SvJ contribution to the background. The percentage A/J 
or 0576U6 background is calculated for each homozygous mouse from its 
breeding history. 

l\^olecular and cellular analysis of homozygous mice: 

Various tissues of homozygotes, heterozygotes and wild type littermates at 
various stages of development from embryonic stages to mature adults are isolated 
and processed to obtain RNA and protein. Northern and western expression 
analyses as well as in situ hybridizations and immunohistochemical analyses are 
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performed using cDNA probes and polyclonal and/or monoclonal antibodies specific 
for asth1-J protein. 



Phenotypic analysis of homozygous mice: 
5 A/J, C57BL76, wild type, heterozygous and homozygous mice in both A/J and 

C57BL/6 backgrounds at varying stages of development are assessed for gross 
pathology and overt behavioral phenotypic differences such as weight breeding 
perfonnance, alertness and activity level, etc. 

Metacholine challenge tests are performed according to published protocols 
1 0 (De Sanctis ef a/. (1 995). Quantitative Locus Analysis of AinAfay 

Hyperresponsiveness in A/J and C57BL/6J mice. Nat. Genet. 1 1 : 1 50-1 54.). 

Targeting at asth 1 -J axon C: 
Assembly ofexon C targeting construct: 

15 A 3.2kb Hindlll-Xbal fragment (bpl 151 5-bp1 4752) from iVIWI-J was isolated, 

cloned into the Hindlll-Xbal site of pUC19, isolated from the resultant plasmid as a 
KpnI-Xbal fragment, inserted into the KpnI-Xbal sites of pBluescriptll KS+ 
(Stratagene), and the 4.5kb RsrII-Mlul fragment isolated. A 3.4kb Hindlll fragment 
(bp17217-bp20622) was isolated from MW1-J, inserted into the Hindlll site of 

20 pBluescriptll reisolated as a Xhol-NotI fragment, inserted into the Xhol-NotI 
sites of pPNT, and the 9.5kb RsrII-Mlul fragment isolated. The two RsrII-Mlul 
fragments were ligated together to produce the final targeting construct plasmid, 
Asthlexc. Asthlexc was linearized by digestion with NotI and purified by CsCI 
banding. 

25 

Identification of targeted ES clones: 

Approximately 10 million RW4 ES cells (Genome Systems) were 
electroporated with 20|jg of linearized asthlexc and grown on mitomycin C 
inactivated MEFs (Mouse Embryo Fibroblasts) in ES cell medium (DMEM + 15% 
30 fetal bovine serum+IOOOU/ml LIP (Life Technologies)) and 400 pg/ml G418. After 
24-48hrs, the ceils were refed with ES cell medium. After 7-10 days in selection 
culture approximately 200 colonies were picked, trypsinized, grown in 96 well 
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microtiter plates, and expanded in duplicate 24 well microtiter plates. Cells from one 
set of plates were trypsinized, resuspended in freezing medium (Joyner, supra), 
and stored at -85C. Genomic DNA was isolated from the other set of plates by 
standard methods (Joyner, supra). Approximately 10 pg of genomic DNA per clone 
5 were digested with Ncol and screened by southern blotting using a 518bp fragment 
(bp8043-bp8560) as probe. A banding pattem consistent with targeted replacement 
by homologous recombination at the Asth1-J locus was detected in 3 of 46 clones 
screened. 

Targeted clones are injected into blastocysts and high percentage chimeras 
10 bred to A/J and C57BL/6 mates analogously to that done for asthl-Jexb knockout 
mice. Heterozygote, homozygote and wild type littermates are obtained and 
analyzed analogously to that done for asthl-Jexb knockout mice. 



The data presented above demonstrate that ASTH1I and ASTH1J are novel 

15 human genes linked to a history of clinical asthma and bronchial hyperreactivity in 
two asthma cohorts, the population of Tristan da Cunha and a set of Canadian 
asthma families. A TDT curve in the ASTH1 region indicates that ASTH1I and 
ASTH1 J are located in the region most highly associated with disease. The genes 
have been characterized and their genetic structure detemiined. Full length cDNA 

20 sequence for three isoforms of ASTH1 i and three isofomns of ASTH1 J are reported. 
The genes are novel members of the efs family of transcription factors, which have 
been implicated in the activation of a variety of genes including the TCRa gene and 
cytokine genes known to be important in the aetiology of asthma. Polymorphisms 
in the ASTH1I and ASTH1 J genes are described. These polymorphisms are useful 

25 in the presymptomatic diagnosis of asthma susceptibility, and in the confirmation of 
diagnosis of asthma and of asthma subtypes. 

All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application 
were specifically and individually indicated to be incorporated by reference. The 

30 citation of any publication is for its disclosure prior to the filing date and should not 
be construed as an admission that the present invention is not entitled to antedate 
such publication by virtue of prior invention. 
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Although the foregoing invention has been described in some detail by way 
of illustration and example for purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of the teachings of this invention 
that certain changes and modifications may be made thereto without departing from 
5 the spirit or scope of the appended claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT: AxyS Pharmaceuticals, Inc. 
(ii) TITLE OF THE INVENTION: Asthma Related Genes 
(ill) NUMBER OF SEQUENCES: 339 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Bozicevic & Reed, LLP 

(B) STREET: 285 Hamilton Ave, Suite 200 

(C) CITY: Palo Alto 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 94301 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 21-JAN-1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Sherwood, Pamela J 

(B) REGISTRATION NUMBER: 36,677 

(C) REFERENCE/DOCKET NUMBER: SEQ-4P 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 650-327-3231 

(B) TELEFAX: 650-327-3231 

(C) TELEX: 

(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 72928 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
GCACTTTTTG GGGAAGGTGG AAGAATAAAA GTAAGGGAGG TGTGCTGAGA CTTCAATTTT 60 
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AATATCTTAT TTCTTAGGTT GAGTGTTACA CAGGCATTTG TAATCATATA TACTTTTGTA 120 

CACTTGAAAT ATATATATTT GTGTGTGTGT GTGTGTGTGT GTCAGAGTCT CACTCTGTCT 180 

CCCAGGCTGG AGTGCAGTGG TGTGATCTTG GCTCATTGCA ACCTCCACCT CCCAGGTTCA 240 

AGAGCTTTTT GTGCCTCCAT CTCCTGAGTA GCTGAGACTA CAGGCAAGCA CCACCACACC 300 

GGCTAATTTT TGTATTTTTA GTAGAGATGG GGTTTCACCA TGTTGCCCAG GCTGGTCTCA 360 

AATTCCTGGC CTCAAGTGAT CCAGTCACCT TGGCCTCCCA AAGTGCTGGA ATTACAGGCG 420 

TGAGCCACCA TGCCCGGTCT GAAATATTTC AAAATGTAAA AAAGCTAAAC CCAAATCCAG 480 

ATGTCTACTT TCAAGGTGCT CACAGGTCAG ATCTAGGATT ATTGCTACTA ACTGATATTT 540 

ATTATCCCAG CACCAGCATG TTTGGCTGTG TGTCATGGGT AAGTTACTCA CCTTCTCTGC 600 

GACAGTGTCA TCATTGTAAA ATAGGGATAA AAGAGTTTAG ACCTTGCAGA GTCCTTCAGA 660 

TTAAAGGAGA TAATCAGTAC GTGGCACTGA GTACCTGCAA TATATTAAGT GGTGTGTGCT 720 

CAGAGATATG ATCACATACA GTATCTTGGA TCTGCCCAGC AACTCTATGA AGATGAGGAA 780 

ACAGACTCAG GCAGGTCAGA GCCAGAACAT AATGTTTCTG GAATTTGAAC GTAAACGTTC 840 

CCCTTTCTCT TATCCAGGCT GAGTGCTAAA GGAATTGTAA AAATGGAATT TGCCTGTTGC 900 

CTGCATCTCC CTCTCTTTTT CTTCCTCTGT GTCCTCTGAA TATCTAGCAC CAGTGGGACT 960 

TTACAGTGTT GGCCTCAATG CTGTAGGGTG CTGTGTGCAC ACTTGTCTTC AGCTCCCTGA 1020 

GTTAGCAGAG CATTGCCCCA ACTCTGCCCT CTGGCCAGCT CATGTGCCTT ACAACTTTCT 1080 

GTTGCCAGAA GAGAGCCCTG CTCATTCTCT AGACTCAACC AACAAAAGCT GCCTACCATT 1140 

TTCAGAATGC CAGTGGGCAG TGAGAAGTGC AGAGCTTGTG TCCTGAGCTT GGCAGCCATC 1200 

TTGCTTGGTG TTAACT^AAGA GTAATTAAGT GATCTCATAA AACTCAGTGG TGGAGGTTGT 1260 

GGTTCAGAGC AAGCTGGGTC AATGCCAAGG CTACTTTGGC TTCATCTGGT CCATAGCCCC 1320 

ACATTTCTCT TCTGATGGTT CAGTTCCGGG AATGAGAACC AGTCTGAGTG TAAGAAGACT 1380 

TGGGTTTGAA TCTGTCTCCT CCAATCACTA GCTGACCTTA GAAAAGTGAC TTAACCTCCC 1440 

GAGCTGCTAT TTCCTCATCT TAAATGGTGA TAGTAATCTT TCCTTACCTT AAGGTTGTTG 1500 

AGCAGCTTAA ATAATATT^T GAGTTGAAAG CTTTTTGTAT GATCTGTTAT TAGGAGTCCA 1560 

GATAGTGTTT TATAAACAAG AGGATAAAAA AAAAAAAAAA AAAAAAAACA GGATTCTGAA 1620 

GGCTGGACTC ATTGCATTCC TTGCAAACTA CCCACTGAGC CCCAACTCTT CCGTCAGCTC 1680 

AAAGTCACTT CTCAGAGCAA ACCAGATTGT CCTGAACCCA GCACTTGCCA ACATCTCCTC 1740 

CTCTTCCCTG ATGAAAACTC TGGGCTGGAG TTGTGGTGGG TGAGGGQAAG GCAGGATAAA 1800 

TCAAAAATTG ATGTTTTAAG AAAACTATGG TATTCTTGGA TGCAAAGGCA TGAGAATGAT 186 0 

ACCTTAGACT TTGGGGCTTG GGGAAAAGGG TGGGGGGTGG CGAGGGATAA AAGACTACAC 1920 

ATTGGGTTTA GTGGACACTG CTCGGGTTAT GGGTGCACCA AAATCTCAGA AATCACCACT 1980 

AAAGAACTGA TTCAGGTAAC CAAACACCAC CTGTTCCCCA AAAACCTATT GAAATAAAAA 2040 

CAGAAAATTA A/UU^AAAGAA AACCTATGGT ATTCTTGGAA GAAGCACAGT GGTGAAGTGG 2100 

AGTAGACACA GATGTGGAAG TGATGTGAAC TTTGGTAAGT TGCTGAGCCT CTGAGGATGA 2160 

TTTCCCTCAT CTGTCAATCA GGGAACAAAA TCCCTTACTT GTACAATGAG TATTATAAAG 2220 

ATCAATTCAG ATGACGCATG TAAAGATGCA ATGTGGGACT GGTAGGTAGT AAGCATCCCA 2280 

TATU^TGGCAG CTATTAATAA GTAATAATCA CCGAGTGGTG GGCTGCCTTT CATGAAAACA 2340 

TTCCCAGCAA GCTGCTCTTC TGTCGGCTCA AAGTCACTTC TCAGAGTAAA TGAGATTGGC 2400 

CAGTTCTTTC TTTCCAAGGC TTTTCTGGAT ATTCATTTGT CCCAGATTTC TCCTGTATAC 2460 

AAAGCTCAGG AGTGAGGACC CCCACAGTGG GGCTTGCACA AGGATAGCCT TGGGGGGCTT 2520 

TTTCTAAGAG CTATGACTTT GAATGCTCTC TTCATCGATG CTGACAGATG AGGGCTGATG 2580 

6AAGT6GTCA TGTTTTAAAA TGTCTGATGT CCAGATUICAC AGAGATGTGT ACGCAAAACA 2640 

TTCATTCATT CAAGATGGAA TTAGTGCCCC AGACACAGAG GCAGGGGATA AATAGCAAAC 2700 

AAGGCTTGAT TCCTGCCTTC ATAGAGCTTA CTGTCTTGTA GGGGAAACAT GAGTAAATTC 2760 

AGCAGAGTAA GGGCTCTAAT TGGGTAAATG GGGGCTAGGC TGCCTGTGTC CTTGGGGTGG 2820 

TGGGAAGGCT GCTGATCTGG GGTGCCAGAA GACCTGAGTT TTGATGCAGG CTCTGTGACT 2880 

TTGAGCAGGT CGTTTCCAAC TTCTGAGCTT CCATTTCCCT AGCTGAAAAT GGGGGCTTGC 2940 

CATACTCGAT GCTGTACTCT ATGAGTCTTT GCAGCTCTGT CATCTTTTTT TCTTTTGGTC 3000 

ACTCAGAGAC TCCAGGATTG GGAGAACAAC CTGCATTCTG ATTTAAAGTG TGAATCTAAT 3060 

AATTTCAAAA AGAAAGGGAC TAAAAGGGAC AAACTTGTTT CTGTTTATTT TCCATCCTTC . 3120 

TTTGGGGAAG TGTAACATTT GAAATCAAAT TCTCATTGGC TTAGCCAATG TGTAGACTTC 3180 

GAGGGGAAAT TCTCACTGCC CAGAGAAGTG ACTAAAAATG ACCATTACAG CCAAAAAGAG 3240 

AAGTTTTTTT TTTTTTAAAA TCTGTGCTCT ACAGATGGAT GAAGTGCTGC TGCACATGGA 3300 

CAGAGTGGAT CTGGACATTC TGCATGAGCC CAGGGATCCT GAGAATGGAT TGGCTGAGCA 3360 

TAGACAGGGT GACCTATCGA TGTTCACTGT GGTCCTGATC TATGTGGCCT CTTCCTAAGG 3420 
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GAAGATTTTT CTTAAGGTTG TTTCCTTTCT CAGCAGATAT TT6TGAAGAA :ACTGTATCTG 3480 
TAGTCTCATT TTGTCCTTAT AATGACCCTG ATGGATGGGA GGTAGAGGGA TGATGATCAG 3540 
TAAGAGCTGG GAAAGCACCA GGAACTAGCA AGAGCAGGAC ACCTTTTCCA CCACTAGGTA 3600 
AATGGACCTA GTGACTGCTG GCACCGTGGG TGAGGGGACT GCCTGGCAGG AGCTGTGGCC 3660 
GTAGCTAGGG GATTACAGCT ACGGCCACAA CTCTGGCCCT GTACGGAGGG AGTGGGGGAA 3720 
ATAAAGAGTT CATATCACTC CCCTCTTTCC CTGGAGTCTC CTGCTGGTAC CTTGCATTGG 3780 
CTGAGTCTAA CTGGAAGCCA GAGGGCAAAG GAGGTACCCT TTCCAGCTCT GCAATTCTCT 3840 
TCAGACAGGG CTGGGATTTC TGGAGAGAAT TTGCAGAATC AGAAAGCAGA GCTTTCCAAT 3900 
CAATGCCAAG CAAGAGACTC TGCAGACTCT CATAGCCTTG GGACCTGAGA AACCAGGTAT 3960 
CCAGTGAGCA GTCACTTAAG CCTGTTCACC TGGCCCTCTC TTACTTTCTC TCCTATAGCA 4 020 

GCAGCAAAGG AGCGATGGGC CGAAC^GGACT TGCTGGGTAG AAGTGGACCC ACATTCTAAA 4080 

AAGGAATGGA AGAGAAACCT GATTTCTTTG ACTCGCCCTG TCCCTGAAGA TGAGGGGCAG 4140 

GCACAGACCA GCCCTCTCCA GAAAGACAAA TATATTCTTC CATTCATGGG AGGGGTAGTA 4200 

GAGACTAACA TTTGTTAAGT ATCTATTACA TGGGGGGTAT GGAGGTAGGC CCTTTGTGTG 4260 

TGTTGCCTCT TTTAATCCTT TGGTGATCAA CTCATGAAAA TAAACAGCTC CAGAGCCAGC 4320 

TGTCTTTGGA GGGTGTAGGC AGGCCCGGCT CTGGGAAACC TGGTGACACT GACCTAGTTT 4380 

GACTTCCAAA TCTTCTCTCT TCTTCGATTC TGGTGAGCCC CACTCTAGCC CCATAGTATG 4440 

TATGGCCAAG CACCCAGATA CTGCTTCCAT CAGGAGGAAA TAACATACCT GATGAATTTC 4500 

TTCACTCAAG GTGTTAGGAG CTTAATGTGT TTCCCCCGCC CCCCGCACCA AGAGAATTTG 4560 

TGTTTTCCAA GACAGTCAGA GAGTGGGTGG TGCTGAACTC AAAGGAGTGA ATCACTAATA 4620 

GTGGAATCCC AGGCATTCAG GGAGGTCCTA TTTCTGGGGT GGGTTCCTTC CTGACACTTC 4680 

ATTTTCTACA AAGGTGGCAG CCACCTATTG TCTCCAGAAA GGAGGCTGTC CCTGTGGGTG 4740 

TGGTGACGGT GGGAAAGGAG AGGCACCTGC AGGCTGAAGC CAAGATCACC TGATTTTCAA 4800 

AACCAAATCT GTCCCTACAA AGGAGAAGTG GCTTAAAAAT CCACACAGCC TCCCGAGTGG 4860 

AGGGAAGAAT TCCCTCTCCT CTCTGGAACA GGGTTCCCTT CACCCAGAAC ACGGTGCTGT 4920 

TGTTATGCAA TGTCCCTGTT GGCAAAGATA TTTGAGCCCC TTGTTTTCAG GTCTGTGTCA 4980 

TTTCCAAGAA AGAGCTGTGG CCTTTGAGTA GGACTGGGCT CCTGAATAGG GTCCCTGGTG 5040 

CCAAATGAGG GAGCCAAGAA AAGGCAGAGA AGAGGAAAGT CCTGACTTTT ACATGAAGAT 5100 

GAGACAGCCA GCCCTGTGGC AGCCAGATGG CAGTCCTGTT GCTCTGTAGT GGCCTTGGGG 5160 

TCAGACTAGG GGCAGAGCTG GGCTGAAGGC AGGAAGGCCA GGACAAGACA GGTGAGAAGG 5220 

GCAAAGTCTC CTGTAACCTG GTGAGAAAAT GTGGGCTAAG CCATTCTCAT CTGGAGCTGA 5280 

AGGCTTGGTG GAGAATGGCC CTCAACATTC AAGTTCACAC CCATGGATTT ATAAAAGGCA 5340 

GGGCTGGGGG GAAAGGTTTT TCCCATTATA CTTAATAACA TTATCAACAA CAATAATCAC 5400 

TACTATCATT TATTGAGCAT TGACTCAAAA GACAGTCCTT TTATGAAAAT TATTTACTTA 5460 

AATCCTTACA AAGCTTCTAT TCATTCACCC AACACATATT TATTGAGTTC CTACTATGAG 5520 

CCAGGCATTA TTCTAGGTGC TTAATTTAGA TCAAGGGACA AGACAGACAA AATCCCTGTT 5580 

CTGGTGGCAG GGCTACTACA TGCAATTAAC AGCACACAAC TCTAGGGGGA GCCACATACA 5640 

TGGGCCACCT TATGAATGGT GTGCCCTGAG GTTT^GCATC CTGGCAGCCC CTTTCTGTGA 5700 

CATTTGCATT CTAGTGAAGG GAGTCTAATA CCAATGAAGT AGATGTCATT ATCCCCTGAC 5760 

TACAGTTTAG GAAACAGAGA CACATAGGAA TTAAGTAACT TGCTGAGTTT TTCAGCCAAA 5820 

AATGACTGAC CCATGATTTA TACTGAAGTC AGTCCTTGCA ATTCACCTGT GCCACGTACT 5880 

TGCCTTTCTC TCCCTGGTGG GCACAGGGAA GAGGGAGTAG CCAGGCTGGC CAGATGA6TG 5940 

CTGGGCTGGC TGGCCCAGTA GAGGCACCAT GTCCTGACTG GGTGGACAAA GACTGGGTAG 6000 

GAGGTAACAG AQAATCCCTT GGTGAGTCTA ACTTAGCTAT AAGAAGGCTT GCTGAGAGCA 6060 

GCTGCCTCCA TGCAGAGGGT GGGGTGACCG GCCTTTAATC CTTCCCAGCT GAGGATTTAG 6120 

TCAAAGAAGC TTGTCTCTGG GGATAGCCTA TGGTCTTGAA GGGCCTGAGT TAGCTATTAG 6180 

TTCACCCATT TATTTAACAT TCATTCATTA TTTTTAAAAA ATTTCCTAGC TATGTTTGGG 6240 

GGCAGAGAAG TGGGTCCAGA GACCTAGAGG TTTGCAAGGG TAGCTTCTAA ACTCCTTTGG 6300 

TTCAGAACAG AATAGAAAGT GTCCTCGGGT GACCTTGGGT CTGCTTCCCA AGCAAATTGA 6360 

GCATACGCAG CCAGAACAAA GACTGCACTC TACTCTAGTG AGCTCAGCCT GCTAGGCTTG 6420 

GATCTAGATT TTATAGCAAT AAGCTTGGAG TCTCACCTTT GGGTCAGACA GAGTACTACC 6480 

CCAGACATGA GGTAGGGAGA GCCTAGTCTA TATTCCTCTG CCTTTGTCCA AGCCTGCTTT 6540 

GTCCTTCCTC TTGACGAGGA ATAAAGATGG CTTCTGGGTG TGCATCCCCT TCCTTCTTCC 6600 

ACCTGCAGAT GTACCTGTTT GTGTGCAGTG GGCTTCTGAG TCCTGGGCAG GGATGCCAGA 6660 

GACCGCAAGC CAGATGCTTG GGATGCCAAT CCTTGGGACT TTGAGGAGAA AGAGAGGTTC 6720 

TGAGGGGCAT CTGTCTATGG CACAGAGTCA AATGGAACAC ATGGAAGTCC CTTAGAAGGC 6780 
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TGGTATCTAA GTGTTGGCCA CACAATGTCC GTTCTTCCTC CATTATTTGA ATTTCTCCTT 6840 

CTCTATCCTT CTATCTTTCT TGGCACCTTG AGCCAGGTCT GGGGTGAGAG AAGGGATGGT 6900 

GTAGGTGAAT TAGTGGTAGT TATTGGAGGA AGGCAATAAA CCCAGAAAAA GTGTCACGTG 6960 

ACTTCTTTCT TGGGCCCAGT GTGACGCTTC TAGTTAGGCT AACGTGGGTC TTGGGACTGT 7020 

TCCTGAGATT TTGTGGAAAA CTCTTTGTAT TTGTGCTGGT AACAGAAGGA AACCAGAGTT 7080 

AGGGCTGGTG GGATGAAGCA GTGGGAACAC TGATTTCTCC TTTTTTTCAG ATTCAGGGAT 7140 

TTCTGTCAGA GAGATCCGTG GGGGAGGGAT GGGATTGGGA GTGAGGAGAA TCCCTTTCCT 7200 

CTCCTCTCAC CATCTGGTGG TCCCCGTGCC CACGCACCAG CTCGTTGGAT GGACATTTTG 7260 

ATTCCCTTTIA GATGTACATT CTTCAAATCA TTGTTTGTCA TTAGCTCCCT GGAGAAAATG 7320 

GAGGGGCTGA GATATTAGTG AGAAAACATA AAGTTAATTG GGTGATGGAG ACTGGGAGAA 7380 

GGGGAATGTT AGAAGAAAGT GAGCGAGGTC TGCTAAAAGT GAACTTTATC TTCTTCTCAA 7440 

TTTTGCCTAA GACTCGTGTT GCCTGGGCAG TCTCTTTTTG GAAGAGAAAT TTTCATGACA 7500 

GTTTGGGCCA GAGATGGCAA ATAAATGCCT GACATGGTTG CTGCCAGCCC CTGTCTCCCG 7560 

ACACGTTCAC AAGGGTGCAC ACCACTTCTC CTCTCTGTGA CCATAGACTC AGACCCATTG 7620 

CAATCCAGCA TCCTGCATGG CCCCATTGGT CAGAGTTGAC ATTTGCAATG AAGCTGCTTC 7680 

CCTATGCCTG GTTAGGCCTT TTGCTATGAA TTCTCTGGAG TTAACTATTT CCAAGGGGCT 7740 

CCAACTTATT CTTGTGATTT CCACGGGATT TGGAGCCCCA GAAGACAATC CCATGTGGAT 7800 

TCACAAAATG CCCTCTAAAT TTGATGGCTG TCAGTGCATA CTAAGTATGA CTGACTCACT 7860 

GGTATCTGTT TCCTCCX3CTG ACACAGCTGG TTCTTAGGCT CGGCAGGAGT TTGGGCTGAG 7920 

ACCTCTCATT GCTCTATATT CCCTCTGTTA CTAATGAGGT GTTGTTCCTT AATTACTAGG 7980 

TGCTGGATAC TAG7ATTGCT TTTCTTTGTT TCAGGGGATT TAGCAAAGGG CTTATAAATA 8040 

TTTCTTGTGT CTGGCATGAA CTACCTGATT TTTTTATTCT TCAGGTCACT GAGCTGGCAA 8100 

TAAAGGCAAC TCAAAGTTAG CTGGGAATCA GAATGAAGGG 6GACTAGGAA AAGTGAT6CC 8160 

TAGAACACCA ACAGQTGTGG GATCATCTTC ATTGTACCTT TC7VGAGCCTA AGATATAAGT 8220 

CCTCTGGATA CTCTCTGCTT GTTTATTTAA AGGAAAAAAT AATCAGAATG TGGGAGAAAT 8280 

GGGTGCTTTG GGTAATTTCA TATTCTAATT GATGAACGTG TATGAAATTA TAATATTAAA 8340 

CCACTACTAG CCCTTGCCGT AAAAAACTAT TCCAAAATAG CTGAGTCTAA GTTTCCTGCC 8400 

TCAGTGTGTC CCACCTCTTG CGCTTGAGTC CTTAATGATC CAGAGTTTCA AGTCCCCAGT 8460 

GCCCTAATCT TGAAAAGCAG AAACTTTAGA AGTTTGCTGA AGTTTATTAG TTGGCTATAC 8520 

GATCCATCAA GAAATTGACT TTTTTGGATT AAATTCAAGA TAGTTTTTAA AAAATCAGAA 8580 

GTTTCTTTAT CATGAAAGCT AAAAAAATAA TTGAAGGTAG AGGCTAGTTG GAATCCCAGT 8640 

TAATAGATGG ATTTCTTCCT TCTTGAAGAA ACTTGTGTCC AAGGGCAAAC TGAATCCTGG 8700 

TGGTCTATGC TGGCCACATT CAGCAAAAAA TGGCCCGAGG TTTTGATGGT TATCATTCTC 8760 

AAAACTGTTC CTGCCAACAC ACTCTGATCC CAGGAGGTTA CCTGACCTTT ATAAGGCTCA 8820 

GTTTCCTGCC CTGTAAAATG GGCAGGGTAA TCAAGCTAGG CAAAATATTT AACCTAAGTG 8880 

AGGAAATTGT GCTATTAGTG CCCTGAAAAA CATGTAGAAA GACATTAGAC ATTATTTTAT 8940 

TTAATATCAT GTTGAACTTA GTTTTTAAAA AGAAGACCTA TTGGATTTTC CAAGAACAAC 9000 

TAAACTGATT CCTTGTAGAC AGTTTAGAGA ATACAGAAAA TTAGAAATAG GAAAAAAGCA 9060 

AAACAAAACA AAAACCATCA AACAAAGTCT ACGCAAATAC AGTTTCTCTT AACTTTTGGT 9120 

TTATTTCCTT CTAGTCATTT TTTAGGTGCA TTTTTAAATT GTGGTAAAAT ATATGTAATG 9180 

TAGAATTTAC CATTGTAGCC ATTTTTAAGT GTAGAGTTCA GTGGCATTAA GTACATTTAT 9240 

ATTGCCGTGC AACCATCACC ACCATCTATC TCCAGATTTT ATAACCCCAG ACTGAAACTC 9300 

CATATCCATT AAATGATAAC TCCCCATTCC CCTCTCCCTA CCCTGGTGAC CACCATTTTA 9360 

CTTTCTGTTT TTATGAATTT GACTTTCTTG GCGCCTCTTA TAAGTGGGAT CATTTTTAGT 9420 

TGTTTTTATA ATCGGTTTCC TTCCTTTAAA AATATGAATG GAGCCTAATG AATATTGAAT 9480 

TTAGTGTACT GGTTTCTTTG AACATTTCAG CATCATAAAC ATGTTTTTGT ATTCTACATT 9540 

CTTCTTGTAT TGCTATATTC TCTATAGGAA TTTTTTTTTT TTTTTTGACA GAGTCTCACT 9600 

CTGTTGCCCA GGCTGGAGTG CAGTGGCACA ATTTCAGCTC ACTGCAACCT CCGCCTACTG 9660 

GGTTCAAATG ATTCTCCTGC CTCAGCCTCC CAAGTAGCTG GGACCAGAGG TGCATGCCAC 9720 

CATGCCTGGC TAATTTTTGC ATTTTTAGTA GAGATGGGGT TTCATCATGT TGGCCAGGCT 9780 

GGTCTTGAAC TCCTGACCTC AGGTGATCCG CCCACCTTGG CCTCCCAAAG TGCTGGAATT 9840 

ACAGGTGTGA GCCATTGGCC CCAGCCTTGA ACATCATTTT TAATGGCTGA AGATTATAGA 9900 

ATCCAGTGGG TGTGCCATCC ATTATTAGTA TTCTGTTGTT TCCAAATATT TGCTGTTTTA 9960 

AACAGTGTTG TGAAAACATA TTTTTGTGTT GAACTTTTAT CATATTGAGA GGCACTTCCT 10020 

CTGTGCAGAA TCAAGAAATT AATTACCGGT TTATAAGGAA TGTGAACCTT TCAGGCTCAT 10080 

AATCTGTATT ACCAAATGGT TAGGAAAAAA ATGTTCAGAA GGTGCCATTC ACAGATGGAG 10140 
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TGGGCTTCCA 
TGATTCT^GT 
ATGTAGGAGA 
AACAGCATGT 
CTTGGAGGGT 
GGTTCCCCTA 
TGAATCTAGG 
ACATCACCCT 
GCTGCCTCTT 
TCAGTGCTTT 
TCCAAGAAGA 
T6TGTGCTTT 
ATCTTGGGGT 
GATTGTU^GAG 
CTGGCTCTTG 
ATTTCTTGCT 
GCACTGTTTA 
TCACCATTTG 
CTTGAACGTG 
TTCTAACTCA 
GTGACTCACG 
CAGGAGTTCG 
AATTAGCCAG 
GGCA6GAGAA 
GTGCCATTGC 
AAAAAAAAAA 
CTATCATTGC 
TCAATGATAC 
TTTTTCAATC 
AACACAATAG 
ACTATAAATT 
GCTCTGACAG 
ATATTAAACT 
GTACCCTAAA 
TTCTGGTTCC 
TGACTTCAGT 
TACTTTTTAA 
CTGTATTTCT 
CGGATACCAT 
ATAAACTTAG 
AGACTTTAAT 
GTCCAGAGGT 
GACACGTCAG 
TTTTTGAGAC 
GCACCCTCCA 
GGACTACAGG 
GAGATGGAGT 
CCCACCTTGG 
CTGTTTTGGC 
CAATATAAAT 
ATATTATAAA 
TATCTTCATC 
GGAGGCTGTA 
CCTGAGTCCA 
CCTCTCTGAA 
TAAGGTTTCT 



CCAGGGGCTG 
ATTAGATATC 
GGCATAGGGG 
GCAGATCAAG 
TGCTAATTGA 
CCTGGGGGAG 
ACTCTTGCCA 
GGCTCATCCA 
CTCCCTTCCT 
CCTGGTAGCA 
CGTCATAACC 
GGCTGCCTGG 
GTAATTGGAA 
TGCTTGGAGG 
TTTCTTGGAG 
TTTTCAGTAG 
GTGATGATGT 
TTGGTTTCTT 
TCTGGGGTCC 
GATTCTAAGT 
CCTGTAATCC 
AGACCAGCCT 
ATGTGGTGGC 
TCGCTTGAAC 
ACTCTAGCCT 
AGAACCACAG 
TACCCACACC 
ATGAGATTTG 
AGATCCTCTT 
CAAATGATGT 
CTGTGAAATC 
TGCTCAAGAG 
ATTTATTATG 
TATCTATCTC 
CATCTGCTCT 
ATTGCACGAT 
ATGAAGAACA 
TTAATTTAAA 
AATAATAGGA 
AAGGGCCATT 
TCTGCAGAAC 
AATCTCAGAT 
TTTTTTCCAG 
AGAGTCTCTG 
CCTCTCAGAT 
CACCTGCCAC 
TTTGCCATGT 
CCTCCCAAAG 
TGCACACTTC 
TATTTACTTA 
ACAGAAACTA 
TTCCCCTCAG 
GGAGGCAATA 
CATCCCAGCT 
TCTATCTTCT 
TACACTGTGC 



TGAAGCTCTA 
TAGGAAGGGT 
TGCTGATCTC 
CACTGTTCTT 
GATTATGGGG 
AGTTGACACT 
CTGCACAGAC 
TAACTCTCTT 
AGGAAAGCCC 
CCACCTGACA 
ACAAGAGATC 
CCAGAAAGCT 
TTGAGCTCTT 
CTGAACTCTG 
CGGTGGTATG 
CTCTGGGCTG 
A6GTGAATT6 
TTTCTTATGG 
TCCAAACAGC 
CAATGGTCTC 
CAGGACTTTG 
GGCCAACACA 
ATGTGCCTGT 
ACGGGAGGTG 
GGGCAACAGA 
GAGGGAGAGA 
AACAATATTA 
CTTCCTTCCT 
TTCCCCCTAT 
ATGCACATTT 
CACATTAGAT 
AAAAAAT^T 
GAACTTAAAA 
CAAACCTCAA 
CTGACTTCAG 
GTATGTCCTA 
ATTAGGTGCA 
AATGTATCAG 
TCATTCTATT 
AAGAAAGTTA 
ATCATACCAG 
CCACTGTATA 
ACCAAGGGTC 
TTGCCTGTGC 
TCAAGTGATT 
CATGCCTGGA 
TGGCCAGGCT 
CTGGGATTAC 
TATCCGTAGA 
TAAATTACAT 
T^GTATGAA 
TGGATCCTCC 
TTATATCCCA 
CCACCACTCC 
CACCTGTAAT 
CTGTGGTAAG 



ATCTCAAAGG 
GGGAAGGGCA 
TTCATAAGGG 
TCCTTTAGAG 
AATCTAAAGC 
AGTCAAACCT 
TCCAGCTGGA 
TTGTTTCATC 
ATGTCACAAT 
AACACTGCTC 
TGAATCAGCC 
GGGACTGTAT 
AGTGTGGAAA 
GAAGGACTCT 
GCCCACAGGT 
TCATCGAGCC 
CTCCACAGTT 
GAACTCTGGT 
TTCGTGTCCC 
AGCCTTTAGA 
GGAGGCCTAC 
GTGAAACCCC 
AATCCCGGCT 
GAGGTTGCAG 
GTGAGACTCG 
TCATATATGA 
GTGGAAAAAT 
TAATTTTTCC 
TAATTGTATT 
AACACATTTC 
CATGCCTCTC 
CAAGTTGTGA 
CATACACAGA 
CAGTGATCAA 
TTTATTTTGA 
AACGTAAGCA 
TTTTCATAAG 
ACAACTAATC 
ATACATAGAC 
TGTCATAATT 
GATTCACAAT 
TAATTTTCCA 
TCCTAGCTTT 
TGGAGTACAC 
CTTGTGTCTC 

GGTCTTGACC 
AGGCATGAGC 
TAATTAAGCA 
TATGTACTCT 
GTGAGAATTA 
TGTACATACT 
GTGAGGTGTG 
TTAGTTCTGT 
ATGAGGGCAT 
CATCAATACA 



ATGTTGACTA 

GAGAAGCTTC 

GTGACGGGAA 

TGTGTGTTTA 

CACACCCCAA 

CTCCCATCTC 

CCCAGGGACT 

TCAAACATCA 

AAGCGCGCCT 

GCGGCTGCCT 

CATTTTTTCC 

TTACCTATCA 

TTCTTACTCA 

TCCCTGAGGC 

GGGTGTTTCC 

CACTGTTCCT 

TAATTCCAGT 

CTGCATCTCA 

TCTGAGTGCG 

ACCGCAGGAG 

GCGGGTGGAT 

ATCTCTACTA 

ACTCAGGAGG 

TGAGCCGAGA 

ATCTCAAAAA 

CCCCGTATGT 

6TCTTCAAAG 

CTGTACAGCT 

TATAGGATGA 

GTGAAGGCAG 

CTTTCTCAGT 

CAGTTTAAAA 

AGTTGGCAGA 

CCTGTGGTCA 

AGCATGTCTC 

TTCCCTTTAA 

GGTTTTAGAA 

CATGTTTACT 

TAGTGAGATC 

TTTGTCACTT 

TGTATACACT 

TTTGCCTAGC 

TTTTTTATTT 

TGGTGCGATC 

AGCCTCCTGA 

GTTTTTTTGT 

TCTTGATCTT 

CACTGTGCCC 

TGTACCCTTA 

ATCACACTGG 

AAAATGAATA 

CCAATTTGCA 

TGGGTTGTAA 

GACTTGGAAA 

TAACCCCTTA 

TTTTAGCCAA 



CTGGTAGGGC 

CAAAATTCCT 

TTTTCCTTGA 

TTTGGGGCGA 

ACCGCCCCTT 

TGAGATTTTG 

CCAGCTTCTC 

CTGAGAGATG 

GTGCTTCTCA 

TCAGCTGCTC 

CCTGTGGCAC 

TTTTGATACT 

GAACACAAAG 

CTCTTGGCAT 

TTTGGGAGCA 

TGTCTTCTCT 

GGTAGAGCAG 

CTGTGTTTCC 

GACACTCAGA 

GCCAGGCGCG 

CACCTAAGGT 

AAAATACAAA 

CTGAGGCAGA 

TTGTGAGATT 

AAAAAAAAAA 

GTGAAAAGTC 

GACATTCGAT 

ATATAATGAT 

GATTGATTCT 

GA/IAGGGCAC 

TGGGAGGTGG 

AATATTTTAA 

ATAACATCAT 

GTTCTGCCTC 

AGACATCTTG 

AACATGTATC 

AGGGAA6AAA 

GTTTCTAACA 

AATTTGTCAG 

GCTGAAACCA 

GATTGTGTTT 

TATGGGGTTG 

TTATTTTTAT 

TCGGCTCACT 

GTTGTAGGTG 

ATCTTTAGTA 

AGAAGATCTG 

AGCCTCCTAG 

CTATTTTCCG 

TAAATTAAGT 

GCAATTCTAA 

GACCACTGGA 

AGCCGAACAG 

CATCACTTAA 

CAGGTTATTG 

TAATAACAGT 



10200 

10260 

10320 

10380 

10440 

10500 

10560 

10620 

10680 

10740 

10800 

10860 

10920 

10980 

11040 

11100 

11160 

11220 

11280 

11340 

11400 

11460 

11520 

11580 

11640 

11700 

11760 

11820 

11880 

11940 

12000 

12060 

12120 

12180 

12240 

12300 

12360 

12420 

12480 

12540 

12600 

12660 

12720 

12780 

12840 

12900 

12960 

13020 

13080 

13140 

13200 

13260 

13320 

13380 

13440 

13500 
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AATGATAATA 

TGGAAGACCA 

6GTAAAGATG 

AGCATGATGC 

GATGACTGTT 

TTCACAGTTC 

TACCTTAATC 

GGCCTCTCTG 

TGTCTCTCGG 

TCTCTGTTCA 

CTTGCCTCTC 

TTTGTAGCCT 

GCAGGAATCA 

CAGGGGCGCC 

TCCCAGATGA 

AAGAAAGCTG 

GTGCCCAGTA 

TCCCGGGGGC 

TCTTCTCCAG 

ACTTTAATGT 

TAGCAAATTC 

ATCTACCTGG 

ATGCCAATTC 

GGCATAAAGT 

TAAATTATCC 

AAAAAATCTC 

ATCTCGGCTT 

AAATATTCTT 

ATCAGTTCTA 

AACAGTCGTG 

TTCAACTGAT 

ATGGAATGGA 

TTCTAAACAG 

CATTAATACT 

CTCCCCTTCC 

CATCAACTGG 

TCACAGTAGG 

TGCTACGTGA 

AAAACTCAAA 

GGGAGAAGGC 

GAAAATGTCT 

GCCATGGAGA 

TGAATTTAAA 

AACAAGACAT 

TACCAGAGTG 

AGTQAGAAAT 

TTGTGCTACT 

TTTGATGTAA 

GTGAGGGAGA 

AGGTTTGAGT 

GATCAGCCTC 

ACTCATCCAA 

CTCAACACTT 

GGTTTTAATT 

GAAACTTCCA 

AAAGAGCAGA 



ACACATTCCT 

CAGCATGATG 

GTGAGGCTGC 

ATATTGTCAC 

ACAGGGATGG 

CTACCCGAAC 

ACCTGACCAA 

CCTTGCTGAT 

TAAAGCCTAG 

CTGCAGGGCC 

AGCCCCTGGG 

TAT6CAAGAA 

AACGCATAAC 

TGGAACATGC 

TCACAGAAGC 

CAGAGGAAAG 

GTGCCAAAA6 

TTGTTTTTTT 

AAGT6CTTCA 

AACCTCCCCA 

TGACACAGAG 

GATGTTTTGA 

TCCAAACTGQ 

GTQAGATTCT 

CTTAGCATCT 

TCTGTATTGC 

CCCAAAATGC 

TAACAGTATT 

TACTTAATTA 

CTATTCCAAA 

TGAGAGACCC 

CTACAACCTT 

ATGTGAAAAA 

TGGCAGCATT 

TGATTCCAAC 

TGAATGGCTA 

6AATGAAGTA 

AAGAGGCCAG 

GTAGGCAAAT 

AGACAGGGAA 

TGGGACTTAG 

TGTACACCTC 

GAAGAGTAGA 

TTCAACAAGT 

GCAGAAGAAA 

AAGCCGCTTT 

ACAGGTAGCA 

GAGGTTTTTT 

TCTAGAGACA 

CTGCAGGTGA 

TCTGCCCTGC 

GAGATAACAG 

GATCAGTTTC 

GGTTATTTAG 

ACTTTAATGA 

GAAATGGCAG 



AGAGGGCTGG 

CATGAATTTA 

GATGATGGTT 

TCATTTAGTT 

TCACATTGTG 

TACAACATCG 

TATGCTGACG 

CTGTTTTGCT 

TACTGTGGTT 

ATTCATCTCC 

GACCATGGAA 

GTGACCTGTG 

ACTAGTGCAA 

TTGTCCAGGA 

CTCCTGAGAA 

ACTTTCTCTT 

6TAAGGTTGG 

CATTAACCTT 

GATTTTATAT 

AAGCTTTTGT 

GCATCTCATA 

AAATGAACAG 

ATGTTGCCCA 

GTAGCATGAG 

TTAACATGCA 

TCAGGCTGGT 

TAGAATTACA 

CTTTAGGATA 

TAAATACTCT 

CAATTTGGGA 

CCGGCCGGGG 

TTGGTGGTGA 

TGAAAGTCAG 

ATTTTAGCTA 

CGCCCATGAT 

AACAAAATGA 

CTGATACATG 

ATGCAAAAAT 

CCATAGAGAC 

GTGACTGCTT 

ATAGAAGTGA 

AAAATGGCTA 

AACAAACACC 

TTTGGAATAT 

TAGTCTATGT 

ACTGGGAAGA 

GTAAACAGGG 

TTTTCTCGTC 

GCCAAGTACA 

GACTCCTGCT 

TCCAGGCAGA 

CTCATATTCT 

CAGCGATGAC 

TATCTCAATT 

AAGAATTTAA 

GGAAATGAAA 



GATGGATCTA 
CATTTCCTCA 
TCAGGGATGG 
TATCTGCACT 
GGTGATGAAT 
ATATTTTCAT 
ACTAACATGT 
GGTGTGCCCT 
GCTGTACACA 
CAGCAACTAT 
AGAGTGCTAG 
TCTTTCCTGT 
AACTGGGGAT 
AATCTTCCAC 
GGGTTGAATC 
CCAAGATCAG 
GTTAAAATAA 
GTTGGCTGGA 
TTTTAAGAAA 
TCCAGGAATT 
ACCTTTTATT 

tgacacctaa 
tgtctcaa;^ 
atcatatgct 

CTTGAAATCC 
GGCATGAGTC 
ATATATTATT 
TGGGAATAAA 
TTGCCTTTCC 
AAGAGAAAGA 
CTCTACTGGG 
GCTGCTGTCT 
AAGTGTCAGA 
AGAAAAGAAA 
GGTATATACA 
CTACAATATA 
GTCACATATT 
AGAAAGCAGA 
AATGAGTATG 
TGGTTGCACA 
AAATGAATTC 
AAGAAAAAGG 
GGAAAATATA 
GAGTGTGGGG 
ACTACAGAAA 
GGATTTGTTG 
TCAAACCAGG 
GGGTGCAGTA 
CTCTTCCTGG 
AGATGGAAAG 
TACTTTTTAG 
CCCTGATCAG 
TTAT^GATCA 
AAAAAAAAAG 
ATTAAGTTAA 



GATTTTTCTT 

GACATTCTGG 

GTGTGTTGGG 

GATGATGATG 

ATGACCAGAA 

TTGTCTTTCC 

TGCGCCCTGC 

CCACTGTGCT 

AAACCTGTAG 

TTTATCCTTA 

AAACCTACAG 

CATGAGAGAG 

AATGCCCAAA 

TCAGTTCTGC 

CCCCGTCGCC 

AACAAAGGAC 

GAATTTGCCT 

CTTTAGGGAA 

TTCAAGAGTC 

GACTTGGGGA 

TTTTCTACAG 

GAATGTATAC 

TTACTTGCCT 

CTTAAAATAC 

GTAGAGACAG 

TGGGCTCAAG 

TCCACACCTG 

CTATAGATTT 

ACATACTTAT 

AAGCATTTTT 

GAATTTGATT 

GACTTGTCAC 

GGTTGGTAA6 

TCAAACGCCC 

TAAAAGACTA 

TACAATAGAT 

GATGATCQTC 

ATATGATTCT 

CTGGTAGTTT 

AAGTTTCCTT 

ACACTGTGAA 

TATGTTATGT 

GAGGAAAGGA 

CGGAGAAGTG 

AATGGGGTGG 

GACTGAGGCT 

AACTTCAGAA 

AGACTTTTTT 

TCATCTAGAA 

AATGCTGGCA 

ATCCCTTTCT 

AGCTCTCCAG 

GCCCTCACTA 

AAGACAGGAT 

GAAAAAAAAC 

AAAACAGAAA 



CCCCTTTTAG 

TGCTGATGAA 

GGTGATGAAT 

CTGATTATAT 

AGGGAAGACT 

TAGGAACTCT 

CTTTCTTCCG 

CTTGGGTCTT 

ATGATTAAGA 

AGTCAAGAGA 

AGTATGACCC 

GACAGACATT 

CCTGGTTAGG 

TGCCTCCATG 

TGGGGATCCC 

GGTTAGCATT 

TAAGCTCTTT 

GTATGCACCA 

TGAGTTAGGC 

TTAATCTGTT 

ACCACATTGT 

TTATCTCTTC 

CCAATTTTAG 

TAAQTATATA 

TATCTCTACA 

AGATCTTCCC 

GCCTAACATG 

GAAATAATTT 

CTAATAAGCA 

TGGGGGTTTC 

TGTGACACTG 

AGAGCTTATT 

ATAAAGCTTT 

ACATTATCAC 

GGAATAGGTC 

GGTTATTGAA 

ATAAACATCA 

ACTTATTTGA 

CCCAGTGCTG 

TTGGGATGAT 

TGTAGTAATT 

GAATTTTACC 

GGCATTATTG 

GCAACTGACT 

ATGTGGAACC 

TGGACGCAGC 

TATAGAGAAT 

TGTTCTCTAG 

AATAAAGAAG 

GCCAGGCTTA 

GGAGAAACTG 

TAAAATGCAG 

CGAACCTCTG 

CGCTTTTGAG 

CTGATAGTGT 

CTTTTATATA 



13560 

13620 

13680 

13740 

13800 

13860 

13920 

13980 

14040 

14100 

14160 

14220 

14280 

14340 

14400 

14460 

14520 

14580 

14640 

14700 

14760 

14820 

14880 

14940 

15000 

15060 

15120 

15180 

15240 

15300 

15360 

15420 

15480 

15540 

15600 

15660 

15720 

15780 

15840 

15900 

15960 

16020 

16080 

16140 

16200 

16260 

16320 

16380 

16440 

16500 

16560 

16620 

16680 

16740 

16800 

16860 
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ATTCTAATCC 
CATGGAAACA 
CAAAATAAAA 
TTAGATCTCA 
AGAAAGAGAA 
TGGGCCTCCA 
TGAATCAGAA 
AATCCATTC7V 
GAAAAAGAGC 
CCATCTCAAT 
ACTCTAAAAT 
TGCTAGAACA 
CACATGATAC 
AGTGGACAGC 
ATAAACATTT 
AAAAAGAGCA 
ACTACTGGAA 
GACTAATGGG 
GTTAACCCAA 
AAATCCTCAC 
ATTATTTAGA 
TCCTCTAGGA 
ATTCTCACAA 
CAGAGGGAT6 
TTCAAATAAG 
CTTCTCTGCT 
CAGTA7VAGGA 
CAAAAATAAG 
GCAAAACTGC 
GAAAATAACA 
GAATGTAAAA 
AAAAATAGAT 
GAGAAAACAG 
ATAGCTAACA 
ACAATGGGAT 
GCCTTGAGGA 
ATTCCATTTA 
GGTTGCCGGG 
AATCACTCAC 
AAAGAAATAC 
GAGCGGTCCA 
GTCTTTGTAT 
TATTATTTGT 
TGACACCGTG 
ATAATATTAC 
TGTTTATGTA 
TCACCACCCA 
GTTTTCTAAA 
CAGCTCTACC 
TATTCTGCTT 
TCTGGTCTCC 
CTTAGTTTAT 
AAAGAGAAAT 
ATTGTGCCTC 
TCTGTCTCTT 
CTGGCACAAG 



TTTGCAGAGA 
GATTCATTAG 
ATTAAAAAAA 
AAAACAACAG 
TAGAGGAAAG 
GACTGATAAA 
CCTATCCTGT 
AAGGCCTTGA 
AATACCTTAA 
TAAAAAACCC 
TCTACCTCCT 
AGGACATAGT 
AGAGGCACAG 
TTCAGAAGGG 
GGAAAGTAGT 
AAAGGGAAAA 
AAACAAAAAA 
TGAAAAATGA 
AGTTGTGACA 
TTACCGTAAT 
AATATGGATG 
ATAGGACAG6 
CTGTGACTGA 
CAACTTGCCT 
GTCTAGGAGA 
TCTTCATTAA 
GGATTACACA 
GCTAATGAGC 
ATTAAAATAT 
AATATTGGCA 
TGGTGCAGCT 
TTCTCATATA 
GATCCTGGAG 
TCTGGCAGAA 
ATTAGTCTTA 
CATTATGCTA 
TATAAGCTAC 
GATGTAAA6G 
TGTTTGCCAC 
ATGTTTATAG 
ACTCCCCTCC 
TTCTCAATAA 
TGAGTTTATT 
CTCACCTTCC 
TCTTTTAGTT 
ATGTGCTAGG 
ACCCCAATCT 
ATCTATCTCC 
TTTTCCCCTT 
GGTCACTCAC 
AATTTCCTGT 
TTCTTTGCTT 
GATTTTTTGA 
TATCTGTATG 
GAGCTCTTGG 
GTGACTACCT 



TAAAAAAATA 
TGAAGAGGAA 
TTAAACAGAA 
AQAATCAACC 
GAAGGAATAT 
AATCCATCTT 
GAAATGTTAG 
ATGGCATTAG 
ACTTGCTAAA 
ACTGTAAAGA 
GTGCACCTAA 
TTAA6AAAGT 
CCAGAGGGAT 
ACAGCACCAG 
ATTAAAAATG 
AAAACCCCAA 
CAATATTCAA 
AGATGTACAT 
TTTGGAAQCA 
GTTTAAAAAA 
CAAATGCCAG 
GGACGTAATA 
AGCTCTTTTG 
AGGATCGTAT 
CTCCAAAATC 
GTATTTTTAG 
GGTGCAATAT 
AC6TGAAAAG 
CACCTCATAT 
AGGATGTGGA 
GCTGTGGAAA 
ATTCTGCAAT 
AQATGTTTGT 
CCCAAT6AAT 
AAAAGGAAGG 
AGTGAAATAA 
TTAGAGTGGT 
TGGGCATTTC 
ACTTCTACCC 
TTTAAACATT 
TCCTAGCCCT 
CATGCCTAAA 
GCTATGAAAA 
CCACCTATAT 
AAGTCACAGG 
GCTACTTTTT 
GATCCTGGAG 
TGGCTCTTTC 
GTTGACATTG 
TTGTCCTCCT 
GTTAGACCAA 
TGATTGAACA 
GTTGTGGGAT 
AGGGATTCCC 
TGAACCCATC 
ATTCCCAGCA 



CATTGCATAC 
AGAGATCTTG 
TTTAGAACAT 
CAGGAGATTG 
TATAAGAAAA 
GTACCCAGAA 
GACAAGTAGA 
ACTTCTCCAT 
AGAAAATGAT 
AAAAATTCAA 
TTTTGGCAAG 
GGAAGAAATC 
CCCAGGGAGA 
GGGAAAAAAC 
TGTGTAACAG 
GCAGATGAAA 
GAAAGGAAAC 
AGTTATTAAA 
CGGGTAGGCA 
TTGCCATGTC 
AAGAAAAATT 
GGGAACAGAT 
CTCTCCTTGT 
AGCCAGCAGC 
CATGTGCTTA 
TGCCTAATTG 
GA6CCATGAC 
ATGCTCAACA 
ACATTAGGAT 
ATAACTGGAA 
ACAGTATGAT 
TCCATTCCTG 
ATACCCATGT 
GAGTGGATAA 
AAATTCTGAC 
AGCCAGTCAC 
CAAATTCATA 
TCAAAAAACT 
TGGTTCTTTT 
CAAATAGTAC 
GTGCTCCAGT 
TGTATTTTCT 
ATAGAGATTA 
GTACAATTCA 
TCAGTATTTA 
TTTTCTTTAA 
TTCACAGTTA 
TCCAACTCTT 
CTCCAGAGCC 
AGGTTTTCTT 
CTGTGTCCTG 
TACTACCTAT 
GAATATTAAA 
CTTGCACTTC 
CCCCAGGACA 
AATGCCAATC 



CTAAAACAAG 
GAAATTAAAG 
AATGTTGAAA 
TGTGTGACTA 
GTTTCAAGAA 
AAAATTGACT 
TCCTAAAATC 
ATCAATACTG 
TTTTAACTAG 
ATCTTCTCAG 
TATCTCAGGA 
AGATCTGGGA 
GCATGTCCAG 
AAT^TGAATA 
GTGTGTTGTT 
AGTAAAGAAG 
GAAATCATGG 
ATGCAAACAT 
CAGTGGGGTG 
AAAGAATAGC 
AAAGGAAGTG 
ATTCTGCATT 
TTTGCAGATG 
TCATGAGTGT 
ACCATGAAGT 
CCCATGCTCT 
TCTTGTTGAA 
TCACTAATCA 
GGCTACTATG 
CACTCATGCA 
GGCTCTTAAA 
GATATATACC 
TCATAGCAGC 
ACAAAATGTA 
ACATGCCACA 
AAAAGGACAA 
GAGACAGAAA 
GAGAAATACA 
TAAATCTATT 
TACAGGTTCG 
CCTTTCAGAT 
GGCTCCTTGT 
GATCACTTAC 
CCTTCCCTGT 
TGTTATGATT 
TTCCTTATCC 
TCTCATTTTT 
CTCTCAGTAA 
CTTCAACCTG 
ATCTCCATCA 
GATCCCATAT 
GACATTTCTG 
GTCACTACCC 
TCAACCATAG 
ACATTTCTAT 
AACACCTGTC 



TACAAGTTGC 
ACATAAAAGA 
TGAGAGAACT 
AAGAAGTCTC 
TAAAAGGTCA 
TTTCAAGAAC 
TTCCAGAGGG 
GATGGTGAAA 
JATTCAATTT 
GCATATAATA 
AGATACACTT 
ATCAGGGGAT 
TGTGACAAGG 
TCTGATTGGC 
ACATTTGCCA 
GCAATGGTTA 
TATACTTCTT 
TGATTATTGA 
TAAGAGACCT 
AGCATATCAT 
AAAAATGTTT 
ATCTCAATTA 
AGCAAACTCA 
GGAATGGGGA 
TTTACTACCC 
CTGCCAGGTG 
ATCAGCACGT 
TTAGGGAAAT 
AAAAAAACCA 
CTGTTGGTGG 
AAAATTTTAA 
CCAAAGAATG 
ATTATTCACA 
GTATATACAC 
ACATGGAGGT 
ATATTATATG 
GTAGAATGGT 
GAAAAATAAA 
TTTCTTACTC 
TAATA7VACAA 
GTTGTTTCTG 
ATTGTTTATT 
AGGGTCTTCC 
CCTCATGGAA 
ATGTAAATAT 
TCCTTCACCC 
CCTTTGCTTG 
GATAGTTTCT 
CTCAGGTGGC 
TCTTGGGGAT 
CTTTCTGTCT 
AGAAACAATG 
GGGAAGGATC 
ACAGCTCTGT 
GTGTCTTGGT 
TTAATAATAC 



16920 

16980 

17040 

17100 

17160 

17220 

17280 

17340 

17400 

17460 

17520 

17580 

17640 

17700 

17760 

17820 

17880 

17940 

18000 

18060 

18120 

18180 

18240 

18300 

18360 

18420 

18480 

18540 

18600 

18660 

18720 

18780 

18840 

18900 

18960 

19020 

19080 

19140 

19200 

19260 

19320 

19380 

19440 

19500 

19560 

19620 

19680 

19740 

19800 

19860 

19920 

19980 

20040 

20100 

20160 

20220 
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CTTAGCTTCA 
ATGCTAATTA 
TTACAGTGTT 
AAGGGCTTTA 
TTGCCATTTG 
TTACTGAGTA 
CCCTGACCTC 
ATAAATAATT 
TACAAGATGC 
GGTTTTTCTG 
AAGTCTAAGG 
GCTTTGCATA 
GGCATAAAGT 
TTAAAGAATC 
CATCCCTGGA 
ATGGTCAGAA 
ACAAAACCCG 
CAGTGCAGTG 
CTTGTGTAGT 
AGAACAAAAC 
AAACAGGGTC 
GGGTCATAAT 
GTTAATGGTC 
AGACTGGTCT 
TAAAATCATC 
CATTGGTCAA 
TAAGGTGTTC 
GTGCAACACC 
GGAGTGGTGT 
TTCAGTTTGA 
TAGGCAGTCC 
AGCTGCCATA 
TCCATGATTC 
ATAGCTTCAG 
ACAAAGATTG 
ACCAATTGAT 
TGTATCCAAT 
AGATTTCCCT 
CTTCAATCGT 
ATACTGGAAG 
CATTTTAGAA 
GGGGATTATT 
AAAGTTTTCT 
TATTACAATA 
AAAATTATTT 
TCAACAAAAC 
ACTCCCATCA 
GAGATGTGTA 
TAACTTCCAG 
AGTGTTTATT 
TCGAATAATG 
TATCATGAGT 
TTCCCTGTCT 
TCCTTTGTTC 
CTTTTAAAAC 
TTATTTCTAC 



ACACCCAAGG 
CTAACCTAGT 
TATAGTCATT 
CATGGGTTAT 
TGAATGAGTT 
TCTACTATGT 
TTGGAGTTTC 
TCACAAATAG 
TATGGGAATG 
AAGCAGTGCA 
TTGGAGAGAG 
TACGAAGAAC 
GCAGCAGGGA 
TGTATGCAAC 
GATGGGGATA 
CCGGCATCTG 
TTTCTTTCTC 
TTTGAGGATG 
GATGGAGGCT 
ATAGTCATTG 
CCAAGCCAAA 
TCTAGGGCTA 
CTGGATCTAG 
GATTCTCTGT 
CTTATTTTTT 
TGTTTTAAAA 
AATAATATGA 
TTTGTCAGAC 
GTAGGAACTT 
AAAGGCAGTG 
CTTGTGGATG 
CAGATGGG6C 
6TCCGCAAAT 
GGACATTAAA 
TGTAATCACA 
CCCCT^CAGAA 
CTATTATTGC 
TTATGTTAAT 
TTGTTAACAA 
AATATTTCCT 
TTTAACCTGC 
AAACTGTGGT 
GGAACACAAC 
GCAGAAGCAG 
AATATCTGGC 
AATAAATCAA 
TGGCCAATTT 
AAGTACACCG 
AGCATAGACA 
ACCTTTGTTT 
GCTGGGTTTA 
TGGCACTATT 
CCAGGAGATA 
TTTTCATGGC 
ATATGTCCCA 
AATAGGGTTG 



TTTAAGTTGC 
CCTTAAACCA 
CAACAAACAT 
CCAATGTAGT 
TCAGAGAGAT 
GCTAGAAAAT 
TAGTCTAGCC 
TCTATTAAAT 
TATAAAGGCC 
TTAAGTCTGC 
GGAGGGAACA 
TGAAAGGCCA 
CCGGGTCAGG 
CTTGGTGGAC 
AATGGAGGGA 
AACCCAGCTC 
TGTGTGAGAA 
CTAAGTTACA 
ATACATTGTG 
CCTTTAACAG 
ACTTAACATG 
CATGTTTTGA 
GCAGCATTTT 
TTTGGCTAGA 
AAGTTCAGAT 
CTTTCTCACA 
CCACAT6GTC 
ATATACCCAA 
AAGGGTCCTA 
TATCTCATGT 
CCTGGGGAAG 
AGAGGGTGAA 
ATTGTAAAAC 
AAAGTAGTGT 
ATTTAAAATA 
TGCTCTTGTT 
AATTGATGGA 
GAGTAAGAAG 
CACGAGCAAC 
CAGTCTTTTT 
ATTATTAGCA 
CTAGGGGCCA 
CATGTCCACT 
AGTCGAGTAG 
CTTTTGCAAA 
GCCTTGATCT 
CTATCTACCA 
TTATAGAGTA 
ATAGTAAAAT 
CTAAATACAA 
ACAAGTGGTT 
GCTTTCCTTT 
ATCCTTTTTT 
CCTCTGGCTA 
CAGAGTACCA 
ATACCAAATG 



ATTAATCACT 
TACTCATTTA 
TTATTGTCAG 
CCTCATGAAG 
AAAACTTCTC 
GAGGATACCG 
TAGTCTGTTT 
ACATTTGAGA 
ATAAGCTGTC 
AGGATAAGGA 
GCATGAGCAA 
ATGCGGCTGG 
GAGAAGACCA 
GTGGGAGACA 
TGCGGGATGT 
TCATGACAAG 
TGAACAAGGT 
CCCCAACAGC 
TTGTGAAAGG 
CACACAGCCT 
AGCAAGTTAT 
CTGTTTGACC 
CCGAAGTA6A 
AATTGTGTTC 
ATTCTGCTAA 
AACGGGCTTA 
TAAATTTCCT 
TTTTGGTTTG 
GTATGTATGT 
GGATCCCTGT 
TCGGGCTGTG 
TGATGGAAAA 
CCTGTACTAC 
TTCCTGGTGT 
TACAGTACTC 
GACTTTTGTT 
CAAGTGCCAT 
AAAGGGAAAC 
CTTTTTGTTG 
CTGTTATTCA 
TTTCTCCATC 
TATCTGGTCC 
AGTTTTATAT 
GTTGGACAGA 
ATAAGATTTA 
GTAGTGTCTG 
ACATGACACA 
TTCTCACTCT 
GTAGTCATAA 
TTTATTTAAT 
TGCAAAATCT 
GGTGCCCAGC 
TTCTTCTCAG 
CGCAGGGACC 
TTCCCTTTCA 
GCCAGCAACA 



TAATAAAGAA 
AAGAGGTGGC 
CCATATAGAA 
GTCCTGTGAA 
CAGCCATTCA 
CAGGGGGCAG 
CCAAGGGTAA 
CAAGTGTCAT 
CTAGTCTGGG 
AGAGTCAGCC 
AGGCTCAGGG 
AACAAAGAAT 
TAAAGCATTT 
TGACTGCTGA 
GTGAAGCAAG 
TCTGCTGCTC 
GCCTGCACAT 
TGTGCAAAAT 
TGTCACTCAT 
AATAGAGGCA 
AGAATCATAT 
ACACTATATG 
CTTAAAATAA 
CTCAAGAATA 
ATCATTGATC 
TTGGAAATGG 
ACAATACGCT 
AAAATAGCAT 
CTCTAGTGGA 
GATTCTCAGG 
ATCCTTACAG 
AGAACAAATG 
CTGGCTGATG 
GCTGGTAAAT 
TTGATTGTAA 
TGAGGCTCTT 
TCTGATAAGA 
AGCAGAGCTA 
AACTGGATAA 
CCATGCATTG 
ACTTTTTATA 
CCTGACCTGT 
ATTGTATATG 
GATTTAATGG 
CCAAGCCTTG 
CCAATTTCCA 
GCAAAACATA 
ATAGCTACAG 
TTAAGAACTG 
TTTAAGTTTA 
CTGAGAACTT 
TGTCTTCTTT 
CCTGTCTGCT 
CCACTTTTTG 
TCTGCTTCCC 
ATTTGTAATA 



ACCTTCACAA 
ATCTTAGAAG 
GACACCATGC 
GTGGGAATTA 
TTCAACACAT 
AGGCACATGT 
CAGATATTAA 
GAAAAAGAAG 
GCTCAGAGGT 
AGATGAAATG 
GCAGGAAGGG 
GGAATGGTGT 
GTGCACGCTG 
ACTTGAAGCG 
AGGCTTGTTC 
TTTTTGGTAC 
TTTTCTGTCC 
CTGTTTCTCT 
TTGGGAAATT 
ATAGGAATGT 
ACAATTCTTA 
CAGCAGTATC 
CATCACTCTT 
ATAACACATT 
TCCATGAATT 
AGGCAGAAAA 
TAGTTTACAT 
TTACTTCCCA 
AACTTTGGGG 
GATTCTATAC 
ACCTTCTCTG 
TTGCTGATGG 
CTTTAACAAA 
ATTTATTGAT 
ATTCCTTATA 
GTATCTATAG 
ATGTGGGCTG 
GACACTGGGC 
TAGTTTTTGA 
GCTACAGTCA 
AGTCTAGACT 
TTTCGTACAT 
GCTGCTTTTG 
ACGCA7VAGTC 
GTCTGGGTGG 
TGGTGTAAAT 
GAGTT6GGAA 
TGGCTATAAA 
GTAAGTTTTG 
TATTTTAATT 
AACAATCAGT 
TTTCAGCCAT 
TCCCAAAGTA 
CCAAACTAAT 
ATCAATACTC 
AGCTGTAAAT 



20280 
20340 
20400 
20460 
20520 
20580 
20640 
20700 
20760 
20820 
20880 
20940 
21000 
21060 
21120 
21180 
21240 
21300 
21360 
21420 
21480 
21540 
21600 
21660 
21720 
21780 
21640 
21900 
21960 
22020 
22080 
22140 
22200 
22260 
22320 
22380 
22440 
22500 
22560 
22620 
22680 
22740 
22800 
22860 
22920 
22980 
23040 
23100 
23160 
23220 
23280 
23340 
23400 
23460 
23520 
23580 
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GATTAATGGC CTGGAAACAC TTGCATTTTA AAAAAAGGAG TCTTGTTGAC CCAAAGGTTA 23640 

TAGGGTTTGA ATGTCTGGCA ACATTGCAGG TGTGAGGAAC GTCTTTGGAA TTCCTAGTTC 23700 

CCCCCAAAAG GTTACTGTCT TCTTCAGTGA CAAACAACCA ACCCAAGCGT GTACCCTGAT 23760 

GCTCCTCATT ACCCCTCAAA ACTTTTTCCT TTTCAATCTT TTTAGTTTTA GCTCTTTATT 23820 

TCCCCTCCAC TTTCATTCCT TATTTAAACC TCTCAATTGT AACTGAAGCA GATGTTATAT 23880 

GGACTTGGGG AAAGGGATCA AGAAATCATT CAGTTGTTTG TGCTTATCTA GAACTGTCAG 23940 

CCCCTGAATT GTGTGGTCTT GGCTGGCATC TGAGCACACC TGGTGCATCA GCAGAATCAG 24000 

TGTTCTCTCA GTTCCTGGTT GGCTCTACTG TCTGGCACCA TTCGGCTGTT TGTACTTATC 24060 

TGGAACTGCC AATGGGAAGA TCACATGGTC ATTGAGAAAC CGCACCCTGA AGAGATGGCT 24120 

AAAAGCCTGG AGGGCATGCC CATCACAGCC TTGCCGGGAG TGTGAAAGGT GGTGTGAAGA 24180 

CCCTGGGGCT CACAGGACTC CCTCACCATG GGGCACAGTG TAAGAAGGTC CACGGTGAAA 24240 

ATGCAGTAGG AGGCAGTTAC ATCAGGCTCT GGATCGATGA TATCAAGGAA CAACCCAGGC 243 00 

TGAAGGAAAA GGCGTTTGTG TTTCAGGAAA GATGTATTGA GCCTCATCCA TGCTCCAGAC 24360 

TTTGTTTAGG CCCTGGGTTA CAGCATGGAA TGGAATGAAA CCCCTGTTCT TTAGTTTCTT 24420 

ACATGTTGAG TGGGTGAGAC AGAAAGCAGC AATATGGTAA AGAGGGGGGA ACAGGGGAAG 24480 

AATGGTAGGA GATCAAGTTA GAGAGGGGAA TGGGCTAGAT CATGGAGCAA CCGGGGCAAG 24540 

ATGTCAAGCC CTTGGAAGGT TTTGAGCAAQ AGAQTGTTAT GTTCTGACTT ACGTCTTGAA 24600 

ACACTCTAGT TGCTGTACAA GGAGACCAGG TCAGAGGCTA TTGCAGTTGT CCAGGTGAAG 24660 

GTGGCCAGGT AGCGATGGAG GATGAGAAGT AGAAAATTCT GTGAAGGCAG AGCTGACAGG 24720 

ATTTACAGAT GGATTGGCTC ATGAGAGGAA AAGAGGGACT CACGGATGAT GCCAAAGTTT 24780 

TTGACCTGAG AAACTGGAAG AATGGAATTT CCACTTACTA TGATGGGAGA GGTTGTGAAA 24840 

GGATGACTTA G6GGTTGGAG AAAACCAGGA GTTTGGATAT GGGCCTTAGA TATTGCCATG 24900 

CAGATGTTGA GTAGACAGCT GCACATATGA GTTGGGAGTG CAGAGGGAGA GGCTGGGGTT 24960 

CTGGGTATCA GTATATGAAT CATCTGTGTC CACATGGCAT TTAAAGGCAT GAGACCAGGT 25020 

GACCCCCCTT ATAGAAAGAT TAGATCCAAA AGAGTAGTGG TCTGAGGACT GGGCTTTAGG 25080 

CCCTGATGCT CAGAGGTGAG GACCCAGGAA AGGAGACACA GAGAATCCTC TTTGTCAGAG 25140 

CATTACAAAA GGGCTATTTG GAAATAGTTC AGGTGGTGAC TGGGTGAAAA GCCCTTCGAA 25200 

CAQCCTCAAG GACCCAGGCT GGTGGACTGC TGGCTGAGTC CTGTTGTGCC TCAGAGGATA 25260 

TTGTAATATT TGGAAAAATT TCTCCAAGTC AAATTTAAAT TAACATGAAT GTCATATGGC 25320 

TTTTTGGTAC GTCCTACAGT CAAGCT^AATA ACAATTGGAT AGGGTAGCTG CAGGAAGACT 25380 

GGGTGTCTCT ACAGTGGTCA AGTTGGAAGA ACAAAGAATG AGTGATTGAT CTTTTGCTAC 25440 

TCCCCAAGGG 6AGAAGCCAC TGATAGCTTC CTTGGAAGCA CTTTGTACCT CACCTGCCCC 25500 

AGAGTAGATT AAATATTAAG TTTCCTCCCT TCTTTCAAGT CCTAGTGCTG CCATTGATAG 25560 

TGCTGTdACT TCAGGi\AAGT TGCTTAACTT TTCCAAACCT CTATTTCCTC ATTACTAATG 25620 

AGTAATAATT CCCACCATAG GGTGTTTATA AAGATTAAAT AATTTTAAAT ATGTTGAAGC 25680 

ATGTAGTGAA CTGCAAAGCA ATATGCAAAT ATAAGAGGTG GAAATGACTA TGCCTATAAT 25740 

TACX3TGGCTC AATTTACACA ATAATAGATT TTCACACTTT GCATAAATTVA TGAGGGTTTT 25800 

TATACTCAAG TCACTGAACT TACTATCTTC AGGATCCAAA ATCCCCAAAC AGAAGGCATC 25860 

CCCTACTGTT AGCTCAAATA GCTCTTGCTG GTTTAGAGAG TTAATGCAAG CCCCACTGCC 25920 

TCCTGAGCTG GAAACATGAA ACAGAAGTTT CAGTTCCCTA ATCAATCCAT TCTTTCTTCC 25980 

TCTGGCTTCT GATAGGCCTC CTCCTTATCT TTGTAAACCC TGTAGCTGGT TGCTAGTTGA 26040 

AAGT6CCTCT GATCTCCCTC TTCTGCCTCC CATGATGTTG ATAAAAAGCA CGAGGGCACA 26100 

TGCAGGATGA AAACGATCGT GGTCCTGCCA GCCTGAATTA TTAAAGCATT TCAGTCCTAA 26160 

GTATGAGGTG TGTATATGTT GGGGTGTGGA GTGAGTTGTG GAGATGAGAG ACAGCTGAAT 26220 

TACATAAAGT TGAGAAGATC TGAGTTCTAG TCTTGAAATT CACAAGCCAT CTCTATACAA 26280 

TAGTTCCGTT ACTCAGTAAA GTA/kAAGCAT TGGATCTAAG CTTTAAGGAC CCTTCTAGTT 26340 

CTTTCTGATT GGAATTCTGT GACTTCATCT TTTGTGGGTT AGAAACTCAT CACTCTGTCC 26400 

AGTTATTTCT ATATTATGCC ACCAGATGGC AATGTTTCCT TAACCCCAAA GAAAGTTTTC 26460 

ATTCTGGTAA AAAGTCAAGT TTTGTTGCCA ACTTTTCCCC CTCTGAACGT GCAAAAGAAT 26520 

GATTTTCCGA AGCTGTGGAG GAAAGAAAGA ACTCTCCTTC TGAACATCTC AGGTGGTTTA 26580 

TGCTGGAAAC AGACAGGACC CTGTTTAGAG AAGATCTCTC TTTTCTTCGT GGACTGGGAA 26640 

CTCCAGTTGG AATGATGTCT CCTGTGATTG CGTATGGTGG GAGGTGGGAG ATGTTGGAAT 26700 

TGGCGTGTCC TCAGGAGGCT TGGGGGTGGG GGAGATGTGC CCTAGCTGGT GGGCCTGCAT 26760 

GAGCCCTGCA AAACTCTGAC TTATAGAGGG GCATCAGATG CCAAGTTTTA CCAGACCATG 26820 

CAGAACTAGG AATTGCCAGA TGCACTCATA GGGCAGCT7UV AATGGTCCTG GCAGAATCAG 26880 

ACTCTTTCGC TCATAAAGGT CAGAGACGCA AGAAAGTGAC ATAAAGTCCA GCCCTTTTCT 26940 
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TGTGCAGATG 
CAAGTTACTT 
TCAGCTAGCG 
ACACACCAGG 
ACCTATCAAA 
TAATCTGCAA 
TTAACAACGA 
GTTAAGCACT 
TACTATTTCT 
TCACAGAGCT 
GATCTTCTAG 
TGATGCAGAA 
GTTTTGACGT 
CAGCATCACA 
AAAGCTGCCA 
CCCACTACCT 
TAATGGAAGC 
ATGCACGATC 
GGGGCAAATG 
TCTTCATAAA 
CTTGTGAAGA 
6AGATAAGGA 
CATGTGACCT 
AGGGGATAAT 
AAAGTACCTG 
GATGAGTATA 
GAGGCACTTT 
CTGCCTATTG 
CCAGAAATCA 
ACCTGTGCCC 
AGCCAAAATA 
CTCAGAGTCA 
AGGAAGAGAG 
AGGGCCACTG 
ATGTATTTCT 
CAAGGGCATC 
GGGATAGCAG 
AACAAATCCC 
TCCTTTTGGG 
ACTTTGCAGC 
CTGGAGTGCA 
TCTCGTGCTT 
TTTTTTTTTG 
CTCATGATCC 
AAACATAAAC 
AAAGAGATCC 
TAAATTCAGC 
TATTTCCTTT 
ATTTTTTCCT 
TAAAGAATTC 
TAATAACCCA 
TTTCTCTCAA 
TCACTTTTTG 
GGAGTCTATT 
TAACAAAGAA 
GCAGTTCACT 



GGGAAATTGA 
TGTATTTAAA 
CACCTTGTTA 
GCATGCTGCA 
CAGCAGACTT 
AACAAGTCTC 
TATAATAACA 
TAACATCACT 
GCTTTACAGC 
GGAAGGAGCA 
CTTCTATGCT 
ACTTATCGGA 
GAAGGCTCAG 
CAGGACATTC 
GCACCCGTGT 
GCAAAGGTAA 
TAGTGTGTTG 
TCATTTAGAC 
TCTGCAGTGC 
TCAATGATCC 
TGTAGACAAA 
TTTCCAGTCT 
TGGGCAAATT 
ATCATATATG 
GCCAGCAGTT 
GGGGCTACTA 
TGAATATCTA 
ACAGCTAATT 
GGTATTAAAT 
AATAAATGTT 
AGTCTCTTTG 
AACGTGTGCC 
AGGGCTCCAG 
GTGCCACATG 
ATCTAAGAAA 
TCCCTCTAGA 
AATGCCTGGA 
ACCTACATCC 
TTAGCAACCA 
GCCTCTTTTT 
GTGATGCGAT 
CAGCCTCCCG 
TATTTTTAGT 
GTCCGCCTTG 
AAAAACTAAC 
ATATTCGTCA 
CTGTAATCAC 
TATAACAAAA 
TTATGGTTAA 
TCTCATATTT 
CCTAGAATTG 
ATGAATATTC 
TTGAAAAATC 
GGTCTGTCTA 
CCCCAACATC 
TCTGATGCAG 



GGCCTAGAGC 
CATTTCAAGT 
AGCCTGTTGG 
TGCGCTCAGT 
AGCTAGTTGG 
ATAGCAGGTT 
ACAAACATTT 
ATATCATGCA 
TGCAACAGAG 
GAGCCAATAT 
GTGCTGCCTC 
GTTTCTTACC 
TGATAGTGAG 
AGAATGTTGA 
CCAAGAAACA 
AGAGGAACAG 
AGAGTCTCCT 
CTTGTGACAG 
AGAAAGTCGT 
TGTTTTACCT 
TGACGGTCAT 
GACAGACTGG 
ACCTAACCTT 
TTCCAAGATT 
TCTGGCACAT 
ATGCCCATCC 
AACCCATTTA 
TGCCTCATCC 
TATCAGGGCT 
TAAGAAATAA 
CTTGCGCTTT 
TAATAAACAT 
TGTCTGTCAC 
TCCAGGTCTG 
GAAGACTATA 
AGTAGAGATT 
TGGTGTTCTA 
GCCTTCCTCG 
AGAAAGAGTA 

CTCAGCTCAC 
AGTAGCTGGG 
GGAGACAGGG 
GCCTCCCAAA 
AACTTTCTAG 
GAGAATAATT 
CCAGAGATAA 
CTTTTTTTTC 
TGATTCTTGT 
TCTTCTAAAG 
ATTTCTGTGT 
AGTTGGACCA 
AATTGTTCAT 
TAAGTTGAAC 
TCACTGATGA 
GAGATGCATC 



AGGTCAGCTG 
AGACTTTTCA 
CCTCCGGGCC 
GAGACTTCAA 
GGAGAAAAGT 
TTTATTTTAT 
GTTTAGTGTT 
CTTTTGCTAA 
ACTCAGAGAG 
TCACACCCTG 
TTCATGACAG 
GGAGCACCAG 
CAGGCTCAGG 
CTCCAGGGAT 
CTCAGAATCT 
GCAGTGCTGG 
CTGTGTCATG 
CATGTTGTAG 
CCTGAAGAAA 
CAAAAGTGCA 
TGCCCAGAGC 
ATTCCTGGCT 
TTCTGAGTCT 
GCTGTGAGTA 
AGTAAGTATT 
TTACTCCAGA 
AAGCCCACTT 
TACAGGACAC 
TCAGGAGCCA 
ATAAGAGCCA 
AGATCTTAAG 
TCTACAAAGG 
TGGGAGACCA 
TTAAGCCCTA 
AATGGAAAAG 
GTGAATCTGC 
GTGCCTGAAT 
CTGCCTGAAA 
AAGTCTGGAA 
TTGAGATAGA 
TGCAAGCTCT 
ACTACAGGCA 
TTTCACCGTG 
GTGCTGGGAT 
TTTTTTCTTT 
GGAAAAAAGA 
CAATTATTAA 
TTGTGAAATT 
TTCTTATTTA 
CTTTATACAG 
ATGGCATGCA 
AGGCTGTCCT 
ACATGGGTAG 
AGGATCAGAC 
ATACACTAAA 
AGGGCAATCG 



TCCTGATTCT 
ATCATCTCAT 
TGCCAAGCCC 
CAGCTGACTG 
CATTTAAAGT 
TTTATTTTAT 
TCCTGTGGAC 
TAAAGCTGTG 
GTTAGGTAAC 
ATTTGCGTAA 
TTTTTCTCAT 
TCACCTCTCA 
GTCTACAGAG 
GTTGAGAGAT 
AGGTCTCCTT 
GACCGAGGGA 
CTCTGAGCGA 
CAAGGACCCC 
TGGATGTCAG 
TGAAATGGAA 
AGTAGTTACT 
CAACACCACC 
CAATTTCCTC 
TTAAATGAGA 
CAATAAAGAC 
GACTTCTTTC 
TTCTCTATGG 
CTTCCATGTT 
TGGTCTATGA 
ATATAACTAT 
AGTCCTTTAT 
TCCTGGCGTG 
GATGGACAGC 
TGAAAGACAC 
GGAGAGGGGA 
AGCAGAAAGG 
GGAAAAAGGC 
TCCCACCATT 
GACTCTTATT 
GTCTTGCTCT 
GCTTCCTGGG 
CCTGCCACCA 
TTAGCCAGGA 
TACCACCTCT 

GATAAGCAAA 
AATTTAGGTA 
TAATAGAATA 
GGAAATCATT 
TTTTGCCCTT 
GTAGAGACAA 
TTCTCCACTA 
ATCTCTTTCT 
AGGCTGTGCT 
GTCATTTTTG 
CCCTTTGCAA 



ATCTCCTTGC 
CTTGCTGTGT 
CTGCATCTAT 
ATTCGTTCAA 
AATTGCTTAT 
TTTTTTGCTT 
CAGGCTCTGT 
AAATAGTTAT 
TTGCCCCAGG 
TTCCAGATTT 
GTACAGGATC 
TCATTTTCCT 
TTGGTGATAT 
ACTCCTGCAC 
GTATATTTTC 
GCGACAGTCC 
CATGTTTTAT 
ATCATCACAG 
ATAAAAACAG 
ATGGAAATAT 
GTCAGT^AAAA 
CCCTTCTAAC 
ATCTTCCAAA 
TGATGTATGT 
TAATGGTGGA 
TGACCATCAT 
CTGGCCATTT 
TCCCCAGACT 
TGAGTTTACT 
AAAGACCAAG 
ATTCAAGCTG 
GTGTGACCAA 
CACGTGGGGC 
TTGAGTCATIA 
GAAGACCTCT 
TTTTAAACAA 
CACAATGACC 
AGGATTTTTT 
CCACATCTTC 
GTCACCCAGG 
TTCACATCAT 
CTCCCAGCTA 
TGGTCTTGAC 
TCTTAATTAC 
TTAAATTACA 
ATCAGAAAAA 
TTCACTTTGT 
CAATTGAACT 
TCCTGAGTCA 
CAAATAAGGT 
GTTCTACTTT 
CTTTGCAGTT 
GGGCACTCTT 
TTGTTTCAGG 
TTTTCCATTG 
GGTGAAGTGT 



27000 
27060 
27120 
27180 
27240 
27300 
27360 
27420 
27480 
27540 
27600 
27660 
27720 
27780 
27840 
27900 
27960 
28020 
28080 
28140 
28200 
28260 
28320 
28380 
28440 
28500 
28560 
28620 
28680 
28740 
28800 
28860 
28920 
28980 
29040 
29100 
29160 
29220 
29280 
29340 
29400 
29460 
29520 
29580 
29640 
29700 
29760 
29820 
29880 
29940 
30000 
30060 
30120 
30180 
30240 
30300 
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CTGCACCATT GGAAGTACTC TCCATCCAGG 
TTCATCGACC CAGAATGAAA GCATCACCCA 
AGAACCAGTC AGAGTTCCAC CCACCTGCAA 
TGGAAGGAGG GAATACCCAG ATACAGGAAA 
TCTTTGAATG AAAATGACAG TCTTAATTAC 
TAGAGCAAGT CCTCTTACCC CATTTCTTCT 
GGCAATCCCA TATAAACTTT AGAAAATGCT 
TTTTATTGAA TCCATAGCTT CATTTAGAGA 
AGCCACTAAC ATAGAATTTC TCTCTTTTAC 
TTTGTAATGT TTTGCGCAAA GTTCTTGCAC 
GATATTTTTA AGATGCTAGT GTAAATGTTA 
TCTGTTTAGT ATATAGAAAT TTJU^TTCATG 
TGAAACTTTC TTAAATTCTA AAAATTATCC 
ATAACATAAT CCACAAAAAT GACACTTCAA 
TTCTCTTTCT TGCATTTCCC ATGTGGGGTC 
GTGAGCATCC CTGTTCTGTA CACAGCCTCG 
CAATCTGGTT GTTATAGGTT TTATTGTAGC 
TTTTTTCTAG TTTCTAAGAC TTTTAATCCA 
GCTTGTCTCT GCATGTATTG AAATGACTAT 
GGTAAATTAC ACTGATATTC CAAAGTTAAA 
CCAAATTTGG ACATGATGTA TTTTTAAATA 
TATTTAGAAT TGTTGAGCCT AT6TTCAAGA 
ACTGTTCATA TTGGGTTTTG GTATCAAGAT 
TCTCATTTTT TCTATTTTCT GGAAGAGTTT 
TCATAAAATT TGCTTGAGCC ATCAAATCTT 
AATCAATGAT TTTAATAGTT ATAGGATTAT 
TTAGTAAGTA GTGTTTTCCT AGGAATTTGT 
CAGAGTTGTT TATAATATCT TCTAATTATC 
TTATTTTTGC TTATAAATTG ACAATTTATA 
GTTATGATTT ATGAAAGCAA TGTGGAATAA 
CTTAAATACT CATCATTTTT TGTGGTGAAA 
AAAAATGCAC AGTACACTAT TATTATCTAC 
AATTATCACT TTTCAACTTT ACAATAATGT 
ACTTATGTTG GGATTATGTC TGGATAAACC 
TCCAGATATA ACTCCATCAT AATTTGAGAA 
AATCTCAAAA AAAGACTTAT TCCTCCCGTC 
TTCATTCCCC TCACCCACAG CCCCTGTAAC 
GATTGCTTGA GATTCCACAT GTAAGTGAGA 
TTATTTTACT TAGCATGATG TTCTCCAGTT 
CCTTCTGTTT A7UVGGCTGAA TTATCCCATT 
TCATCCATTG ATAGACACTT AGGTTGATTC 
GTGAACATGG GAGTAAGGAC ATGTCTTAGA 
CAGAAGTGGA GTTACTTGGT CATATGATAA 
ATAGTTTTTC ATGATGGCAG TACTAACATA 
TCTCCACAGA TGTTCTCTTT TTCATTACTG 
TGTCTTCATC TGTCTCAGCA GAGGTTTATC 
ACCTTTTAAA TCTTGTCTAT TGTATTTTTT 
ACTTCCTTTT TTCCATATTT TTAGGAGATG 
GGACTCCTAG AAATATGTTA AGTCTGCTCA 
TGTTTTATCT CTTTCTTATT CATTCTGGGT 
AATTATCTCT TTACCTGTGT TGAATTTGCT 
TTTATTGGGT TTTTAAATGT TAAAATTCTC 
ATGATTTTGT TTCAGCTGAT TGCCAAAACG 
AGTGAGCACT GTTGTTTTAC AGTCTTTATG 
CGTTTCTGGT GGAGGGACTG GCTATGAGAG 
CTTACCCATA TTTCTTTCAT GGTGTCTATT 
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GGAGAGAGAC TGGAAATGGT CCATGAGGTT 30360 

TCATTCCCTT CTTGTTACAG CCTATTTGTC 3 042 0 

AAGGTTGTGA CGTGCTGTTT GCGATTTGCC 30480 

ATGCTAGTGA CGTGCACTTC CATCTAACTA 30540 

TGCAGTAAGA TAAGCAGACT CTATACCTGG 30600 

TCT^GAAGGT CTTGGCTAGT TTGGAACCTT 30660 

AGTTAAGTTC TTTAAAAATC CTGCTGAGAC 30720 

GAGCTGACAT TTAAATTAGG GAGTGCTCCA 30780 

TCCAGGTCTT CTTTAATTTC TCTCGAGTGT 30840 

ATCTTTTGAT AGATTTCCCC CTAGGTTTTG 30900 

TTGCTATATA TTTTTCATTT TACAAATATA 30960 

TTTCTGTATT GACTTTATTG AGTAACCTTA 31020 

ACAGCTTCCC ATAGATTTTC TATGTAGGTA 31080 

TTTTTTCCTT TCTGTTTCTT ATGTCTTTAT 31140 

CCTAGACACT GTTGAATAGA TGTCGTGATA 31200 

AAAGGAAAAT TTTCAGAGTT TTGTTTTAAA 31260 

AGCTCTTCAC CAGATTACCT GCATGTTTTC 31320 

TTAATGAGTG GATGTTGAAT TTTAACAAAT 31380 

ATGACTTTTC CCCAATTQAT CTGTTAAGTT 31440 

GCAATTTTTA CACTGGCACC CTCAAGTAAG 31500 

TATATTGCTG GTGTTGGCCT GTTAATATTT 31560 

ATAAAATTGG CTTGTGATTT TCCTTCACAT 316 20 

TACTCAAGCC TCACAAAATA ACATAGGGAG 31680 

GCATAAGTGT GGCATTATAT CTTCTTTATC 31740 

AACATTTTAT GACAGGTTGA TTTTTTATTA 31800 

TAGGATTTTT TATTTCTTCT TTTGTTAATT 31860 

CTATTTTATC AAAATTTATA AATTAATTCA 31920 

TTTGTAATGT CTGCAACACA TGTAATAATG 31980 

ATTGCGTATA CTTATGGGGC ACAAAACAAT 32040 

TTAAATCTAG CAAATTAATA TATCCATCAC 32100 

ACATTTGAAA TTCACTTTTT TTCACAATTT 32160 

AGGTGGTTCC TGACTTCTTA TGATGATTTG 32220 

GAAAGGAATA TGCATTCAGT ATGCTCTATG 32280 

CATAGTAAGT TGAAAATATC AATGGGCTCA 32340 

GCAGCTGTAT ATTTATCATG GTGTGCAATA 32400 

TGAGATTTTG TACCCTTTGG CCATCACTCC 32460 

TACCATTCTA CTCTCTGCTT CTATGGATTT 32520 

ACATGTGGTG TTTGTCTTTC TGTGTCTGGC 32580 

TCAGTGATGT TGTTGCAAAT GATAGAATTT 32640 

GCATGTATAT ACTACATTTT ATTTATCCAT 32700 

CATAACTTGG CTAGTGTAAA TAGTGCTGCA 32760 

CAATCTGATT TCAATATTTG GATAAACACC 32820 

TCTAGTTTTA GTTTTTAAAG TAACTTTCAA 32880 

CACTCCCAAC AGTGTACAAG GGTTCTCCTT 32940 

ACATGAGTTA TCTGTGCCTT TCCCATTTTT 33000 

AATTTTATCA TTTAAAAGGT AAAAATTGTT 33060 

TGTTTCATTA ATTTTTGCTC TGATTTTTGT 33120 

ACTTTGCTGT TCTTCTAACT TCTCTTTCTA 33180 

TTGTATTTTT CTCACCTTTA TATTTTCCAT 33240 

AGTTTCTTCT AATCTACCTT CCAGTTCATT 33300 

ATTAAACCTA TCTGAATGAC TTTTTCATTT 33360 

ATTCCTATTT GGTTCTTCCT CAAATTTGCA 33420 

TTTTTAGTTC AAGTTCATCT CTTTGAGCAT 33480 

TAAATACCTT CTCTTTTATT AATCTTTCCA 33540 

ACAAAAACTT TCTTTCAGGT GCTTTTAGGA 33600 

ATTTTATTAT CTCATTATTT AGATACTTTT 33660 
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CTCCTCTACT 
TCTGTGTCTC 
GTGAATGATT 
CTTCAACTTT 
CTCACTGGCC 
CACCTGCTCT 
CTCATCTTTT 
CCACCTTATC 
GCTTTATATC 
CTGCCTCCCT 
CACAATTATC 
TGTCAAGCTA 
CTCTAAAAGT 
GAGAGGTTAT 
ACTCAACCTG 
GAGACATGAA 
AAGATGTGGG 
TTGAATGATC 
TCTGGCTGTG 
GGCACGTAAA 
CAGATGTCAA 
TTTGTGATAA 
AGACATCTGC 
CTTCTGAGCC 
ACAATCAAGA 
AGGAGGGGGA 
CAAGAACAAG 
CAGAGCCAAC 
TCTGGAAGGG 
GG AATTTGGC 
TTGTCTGTTA 
AACCAGGTAC 
TACCTCTGAC 
AGCTCTCTAG 
TACATAATGG 
TATTAATACT 
AAAATACACT 
GAAATCTGTG 
CATTCATATG 
TTAGGTGATA 
GAATGCCCCC 
TA6GAAGTGG 
CCTCTAGAAT 
GTGGCAGCGT 
AGAAATGCTG 
TATTCTTATA 
TTGCGTAGTT 
GACCCAGAAA 
TGTAGGAAAA 
TTGGCATGTT 
GGAAAGTAAA 
TGTCTTCAAC 
AATGCCAAAG 
ATTCATAGGC 
TCTGGAGGGC 
GAGAGGGAGC 



AAACTAATGG 
TCATGGTATC 
TTAGTCTTAC 
TGGCTTCTTC 
TCCATACTGT 
GCCTGCTGTT 
ATTCCCTCAT 
CAATAACCAG 
ACATATCACT 
AACTAATGTA 
TCCAGTGCCT 
GGTGCTGTGA 
TTAGTATGTT 
TTAGCATACG 
TGCTGTCTAA 
CATGCAGGTT 
GAGTGTAGGA 
AGCTTGATCA 
TCATCAGCTG 
GCAATGTGTC 
GCACTGTGTG 
GAACAAAAAC 
CGCTTCAGGC 
AAGT6CAGG6 
AACCACCGGC 
ATTGGAATAC 
CAGAAAGGAA 
TCAGCGTATC 
GCAACCTGAA 
TCATTGCTGG 
AATGGAAATT 
CTGCAATGTA 
TGGGGATGGA 
TCACCTTCCT 
AACAGTGGCC 
AGTTATCCAG 
CAGTTCCCAT 
TTTTTACCAA 
TTGACATTTA 
ACGATGAAGC 
TTGTCCCTTC 
GCCCTCACCA 
TCTGAATAAT 
AAATGGACTA 
CCCTAGGATT 
AAGACCAAAA 
ATAGCACCTT 
GAATGTCTTT 
TTAACACCGG 
ACTTGAACTT 
GACCAGTCAC 
AGA6GGAATT 
AGGGGATAAA 
TGAAAAGAGA 
GCTGGGGCCA 
AGAGAAATAC 



TTCAAGGCTT 
TAGCAGACTT 
CTACCTGCCT 
CTCAACCTCA 
CCCTTAAATA 
TGGAATGCTC 
CCCATCTCTT 
CATTCCGTCT 
AAGTGACAGT 
TAAGCTCTCT 
TGAATAGTGT 
TAACTACTTT 
ATTATTGTCT 
TATGAAGACA 
AGCTAGCCAG 
GGGATCCCAA 
CTTTTTTTTT 
GAGAATCCCC 
GCTGATCCAA 
AGAAAGAAAG 
GTGGCACACT 
TTAGTTGCTT 
TTTCTGATTG 
GCAGAA6AGC 
AGCTCCATTT 
CATTCACCTG 
ACACCAGATT 
ATTGTTTGCA 
TAGAGAAGAG 
GTGATCCAGA 
ATAGTAGCCA 
TAGTGCTAAG 
AGCCAGAGGA 
GATCCTTCCT 
CTCAAAACTT 
GTTGCTAGTT 
CCCCAGATTT 
AGTATCCCCT 
AACCTCCACT 
CCTCATAAAT 
TGCCATGTGA 
GACGCTGAAT 
AAATTTCCGT 
AGATGTGCAC 
TACAG/^TGC 
TAGAAATGAA 
TTCATCATGT 
GCTGGTAGGG 
CCACGAGACA 
CTAGAAATCT 
CTCCTCTATC 
TAATGCAGGG 
GAGATAGCTC 
AAGGGAGGAG 
TGGAGGAAAT 
CCTGGCTTCT 



ATCAAAGATA 
CCACCCAAGA 
GTTAACTTAC 
ACTACCCCAT 
AGGAAAGCTG 
TTCTTCCCAT 
CAAATGTGAT 
CCCCTCTGCC 
ATACTATAAA 
GAGGGCAGGG 
CTGGCATGTA 
ATATGAAATT 
CTGTTTTACT 
GAATTAGTGA 
GCAGCCTCAC 
ACTGTTGGGA 
TAATAGGCCA 
TACCCCTACC 
ACAGCAATGT 
AAAAGGCAGC 
TGCCCGTTCA 
CCCTCAGGTC 
GTTCCCACTG 
TCCCAAGAGC 
GCAGGATCTC 
TCCCCTTTGC 
GCCCAGAGCA 
TTGATCATCT 
TCTGACATTG 
GACAGTTATT 
CTTCACAGGA 
CCTGACACGT 
GCTGQACCTT 
TCTTCTGTGT 
GTTTTCATAA 
ATTAATACTA 
TTCTATTTCA 
ACTATAGAAT 
GTGATGATAC 
GTGATTACTG 
GGTCACGGTG 
CTGCTGGTGC 
TGCTTGTAGC 
CCTCATGCCC 
TGACAAAGCT 
TACTCCCTTG 
GCAAAT6A6A 
ACTACGGGAG 
ACTGGTTGCT 
GTTCTTTCTT 
AGTTGGAGTC 
AATGGGTCAC 
AAAGGTTAAT 
ATAGTGTTCC 
GTAGTAGCTG 
CATTTTCTTT 



AATCCTCTGT 
TATAAAGACA 
CTACTTGCAT 
TCTTCCCATG 
CCCTAGCCTC 
ATACCCATCT 
TTCTACTIGAG 
ATTCTCCATC 
CGTACCCATT 
ACTCTGTTTT 
GAAGGAATTC 
AAGTATTTCT 
GATGAGTGAA 
GTGATTGACC 
ATACATGGCA 
AGCATAAAAG 
GTGGCCCTCT 
CCTGCCTCAG 
CAACAAAAGA 
TCAGATGATG 
TGTTGTTGAT 
CTCCCTGTAT 
GTTTGGGGCA 
TCCTGGGAAA 
ATCCCATCAG 
AGATACACCA 
CAGGATTAGG 
GGGGATGAAG 
GAGTCAAGCA 
TAATCTGAGA 
TTGCTGTAAA 
AGCAGGGTGT 
TATTTGACTG 
GTACACGGAC 
GAATTATCCA 
GTTATCTGTG 
GTAGGTGGTA 
TAATTTTTGT 
CAGGTGGCTT 
ACCTAATAAA 
AGAAGATGGC 
CTTGCTCTTG 
CTAGTCTATG 
TTTAGGGAAT 
TTGTTGACTC 
AACTCCTTTG 
CGCAAATGAA 
AGAGAGAAGA 
AGCTCGGTAG 
CTGTAAAATG 
TAATCAGGAA 
ACCAGTGTTA 
AAGAGCAGAA 
CGGAATCCCT 
CTGGAGGCAT 
CTCCAGTCCT 



CTTGTTCATC 

CTATGACTAA 

CTCACTTATA 

GCTCACTGTG 

AGGGCCTTTG 

GTTTTAATCC 

GGTTCTCTGA 

ATCTCACCAT 

TGTTTACTGT 

ATTTGTACAC 

AAGAAATACT 

CCTCCAGCAG 

CTGAGGTTCA 

TGAGATTTGA 

AATGCCTACT 

AAAAACACTA 

CTGCAACCCT 

CCAGTTTCTA 

ATGQTGATCA 

CAAGATCATC 

TTTTTAAACA 

GGATTAGTGC 

AAACCG6AAA 

ACTAGGAAGG 

GGGCTGTCTC 

ATGTCTCGTT 

ACACACCACA 

CAGGCTCCGT 

GAACTTGGTT 

ATCAGATATC 

GAGTACATAA 

TAGTAAGTGG 

GCCAGAAGCC 

AATGTTTTTC 

GGTTGCTAGT 

TTGCTAGCTA 

GTGGGTTCAG 

GTTCCCCCCT 

TGGGAGGTGA 

AGAGACCCCA 

ATCTATGAAC 

GACTTCCCAG 

ACATTCTTTT 

TGTGACTTTG 

AAATGCAAAA 

GATGTGCACT 

TCCTTAGTTT 

GCCAGAATAC 

CTGTGCAACA 

AATATGGTCT 

GATU^CCTAAG 

GAAAAGCTGC 

AGTCACTAGT 

GATGGGCTTG 

GCTCAGGGCA 

TGCAGGCACC 



33720 

33780 

33840 

33900 

33960 

34020 

34080 

34140 

34200 

34260 

34320 

34380 

34440 

34500 

34560 

34620 

34680 

34740 

34800 

34860 

34920 

34980 

35040 

35100 

35160 

35220 

35280 

35340 

35400 

35460 

35520 

35580 

35640 

35700 

35760 

35820 

35880 

35940 

36000 

36060 

36120 

36180 

36240 

36300 

36360 

36420 

36480 

36540 

36600 

36660 

36720 

36780 

36840 

36900 

36960 

37020 
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TCACTGGCTG AACTCAGGGG AGCATTTCTC CTCTACAGAA CAGAGTCTCC TTGCATACAA 37080 

CAAGAGGGTC AAACAGAGGA TGGCTTAATT TTTCCTTCCA TTTCTCACTT CTATGATTCT 37140 

CTCCCTTCAG GTTAAGTAAG TGAGGGTAAG TAAGCTGCCC AGTAAGTGAA CAGTTTTCCA 37200 

AACAAGCCCA CAGCACCACC TCTATATACA GCAACTCTCT GTTTATCAGC ACTGCATTAA 37260 

CCAGGACTCT CTATTAACTG GGACTTCCAG TTCCTTAAAT TTCTTCATGG TTCCTGTGTA 37320 

CTCCCA7UVGC ATCTTCATCA AACAAACATT AAGTTACGCT TAGAGACCAT TTCTCAATTG 37380 

AATATAGATA AAAGATTCTA AGGCCTTGAA AAAAATTAAT ACATGCATAT TAGATATAGC 3744 0 

TATAAAAGCC AGACTATCTG ATTAATTATG TGACTGGTGT TAAACTGTTT GGACAAAGGT 37500 

TGGCTAAATT CCCTATGAAT ACTTACTTCC CTACTTCTGT GGACAAGGAA AAATAGACCA 37560 

AAGGTTCAGA TAAAAGCTTG ATTCAATGTC ATCTCTTTTC TCACGAATCT TGGTCATGTG 37620 

TGGGAAGTGA CCCAGATCTA GAACCTTAGC CTTTGGGACT TAAAAAAAAA ACAAAAAACT 37680 

GTTGAGTTGA ATCATTAAGT GTTACTGAGG GACAGGAGAG AGGAGGGTAG CTTTCTTAGT 37740 

TCCAAGACAA ATTTTGTTAA CAAAGATCTG TGGGTAGACT TGTGTCTGGG CAAAAGATCA 37800 

GAAGATGTOC TGTTCTAGGC CTCTTTGCCC TCAGACCCAT TCCCTATCCT TTCCCCTTCA 37860 

CTGTACCCCC TTATCTCCTC TTCTGCTGTC TTCCTCTGGG CCTGATGCTT GAGGATCCAG 37920 

AAGTTTCTCA GGCTCCCATG TTCCAGCAAT CCAGGCCTCC TTCCCAGTAA GGGATGAGTA 37980 

CAGGGGCCAC ACATAGCCCT GCAAGTTTTG TAATCCAACT TGAAATCCAA TGGCAGAATG 38040 

AATGGTTATA TATGGTGTGA CCCAGGACCA CATGCAGTTG TATCACATGC ACTTACA/^ 38100 

GAGCCCCATT TCTTGGACTC ATTCCCAGAC TCAATCTCTC TGAGGGTAGG ACCAGGAATT 38160 

CGGCCCTTTT CACAATCTTC CCAGGTGATT CTCTACATAG TATAATAACA CAAACTCATG 38220 

GAAATATATT TAATGAAAAA TGAATAAAAG AATAAATGAA ATAACAAATG GTGATGGCTG 38280 

GCACAATGTG TGTATCCATT CTCCTACTGA GGTGCACTTA CTTTGCTTCC T^TGTTCAT 38340 

TTGACAAGTA GTGATGCATT GAATATCCTT GTACATGTGA GCATGCAGTA AAGTTTCCAT 38400 

GGGCTTATAT TTGCTGGATT ATGGGCACGT GCATCTTCCT CTTTTCTAGA TATTAACAAA 38460 

TCACTCTCCA AAGTATTTAT AACAATCAAC ACTCCTGAAC AAGCAGTGGG TTGGAATTCC 38520 

TTCCTCATCA CATCCTGGCC AACAATTATT ATCATCAGAT TTTTTAATTT TGCCAATTTG 38580 

AAGGAAATGC AGTGGCTTCT CATGTGTTAG TGTTTCTGAT GATCAGTGAG GTTGAGTGTC 38640 

ATTTTTTTTT TTTTTTTTTT TTTTTTTTGA GATGGAGTTT TGCTCTTGTT GCCCAGGCTG 38700 

GAGTGCAATG GTGCTATCTT GGCTCACTGC AACCTCCGCC TCCCGGGTTC AAGTGATTCT 38760 

CCTGCTTCAG CCTCCCAAGT AGATGGGATT ACAGGCATGC ACCACCATGC CTGGCTAAGT 38820 

TTTATATTTT TAGTAGAGAC AGGGTTTCAC CATGTTGGTC AGGCTGGTCT CAAACTCCTG 38880 

ACCTCAAGTG ATCTGCCTGC CTCGGCCTCC CAAAGTGCTG GGATTACAGG CACGAGCCAC 38940 

TGCACCTGGC CGATTGAGCA TCTTTTTATG TGTTTAATGA TGCTCATTTT TTATTGACTT 39000 

CCTTCTGTGC TTTCTTTTTT TTAGCAGTGA ATTTGAGTTG TAAGAATATG TATTTCTTTC 39060 

ACTCTGGGAT TCACCTACAT AAAGTAATTT TCACTTGAAT GAAAAAGAAA TCAGTTGTAT 39120 

AAACATCTGT TTTTTCTGAA TTTTACTGGT GTAAAAATGG CCACTCAGCC CTGGAAGAAA 39180 

CAAAGGCACT TTGCCAACTG AAGTTGCAGA TGGGAAATTT TTAGAAAGGT CCTGTTCAAC ■ 39240 

CTCTGGAAGG GGAAGATCAT ATCTGAAAGT CAGGGTAATC CACCCAACCC . AAATGTTTCT 39300 

TCTACTATGQ GTTCTGAGGA TTCGTCCATG TGCTTCTTCT GCATTGCTGC CATCTGATTT 39360 

CCTTTGCTAG GCTCCTCTTG CAACTTGGGC TACAAAGAGG TGCTTCATAG TCCACAGTCT 39420 

TTGCCTCACC TTCAGTCTTG AGGTGGTCCC CTAGGAGTTA TTGGTAGTTG CCGCTGGAAG 39480 

CCATTCTAAC AAACCTGGCG AAGGCACAAA AGGATAGAAA GCCTTTAGCC AATATGGTGC 39540 

CATCAAAAAC AAACAGAGGA CGCTGCCCAG TCCTCTTCTG GTTGCCTTTA CTAATGCATC 39600 

AGTCATACTT CTTCTGCACT CGATCTTAGC CAAGAGGTCG AGAAGCCATA GTCATAATTC 39660 

TTCTGAAATT AATCTCTTCC TGCCCCACCT CCCCATCATC TGTCTTTGAA TTCCCAGGGC 39720 

TAGTACTCAT AAGATTATCT CTTTCTTCTC CTTTATGAGG AGACCCATTC TTTTTCACAA 39780 

ACCAGCCACA AAAGCAAGTG TCATTACCCC CTACCGGAAA TACCAGACAG AGAGTTCATC 39840 

TGGGGTTAGT TTCTAATCAA GCCTCCTGCC CGGGTTTTTC CTGCTCCTGT CTTGAAGCGA 39900 

CCACAGGGGG AGAGCAGTTT CCAAATATGA TCCCTCCTTT CCACTGTCAC TTGTCCAACC 39960 

CCGACCACTA TCATTCTTTT ATTTGCTTCT CCCCTGAGCC AGCCAAGAGC CTAGGTCAGT 40020 

GACAGGGCAG GCAGAAGAGA GAGGGGCTTC CAGGAAGGAG AGGGAGCAAC CCACAGAAGA 40080 

GGCAGCAAGA CAGGAAGGCG GGCAGGGGCT GAAAATCCAA TACATATCTA AGTACATTTT 40140 

TCTAGGATGG GCTTCTACAC TCAGCCAAAA CATATATTGC ATATTGTTTG TATTTTTTAG 40200 

AGGTTTACAG GTCTCCCTGA AAGTCCCTCT GTGGAATTAT AAACCTCTAA TAAAAAATCC 40260 

CAGGGTTAAA GAAAGGAAAA GATGAAGGAG AGGCCCACAC TCTGAAAGGA AAGGGTTCAG 40320 

CGACTCCTGG AAGGTTCTGG ATGGTGCTTC CTTGACCAAG TCAGCTGCTT CTTCTACCTG 40380 
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GTCTCCTTTG 
TCCAACGTAT 
AGGGTTTGTA 
CAGATCCTGG 
AGGAACCGGC 
CACAGCGCCA 
GGGTCCTCCC 
AGTGGCCACC 
AATCACTAAC 
CAGAGGTTAG 
GATCAAACTG 
TCCCAGGCTG 
CCATGCCTCA 
AAGGTCTGGG 
GAGATCCAAG 
TAGGGGAAAC 
GCCACATCTG 
GGGCTGGGGT 
TTTTGTTTTC 
CCGGAGAAGA 
CCTTACTTCA 
GTGACACTAG 
AAAGCAGGTA 
AGTCAGTTTA 
AATCTGGGCG 
GCAATGTCAG 
TCCCCAAATT 
TAGAGAAACA 

TTACATAGGT 
GATATTCCCC 
TGCGGTGTTT 
AAACTATTCA 
TGGAGGTGAA 
GTGGAGTGCA 
ACTGTGGCTG 
GCAAGCCAGA 
CAGGCAATCA 
ATCCACCACA 
GGGAGACGGG 
TACTAACATT 
CTTGGCGAAT 
GTGAGATCTC 
CTGGCCTTAG 
TTATTTCAGC 
CAAACCACTT 
GGTATTTTTT 
TGGAATCCTT 
GGGACAACTG 
GTGGTGGACC 
CACTTCCTGC 
GAAGTTGTTT 
AGGGGCACAA 
CGTAGGCAGC 
GTGCAGAACC 
ATTTAGAGTA 



TGGTTCAGCT 
GGGGGCTGTC 
CAGAAAGCTC 
TTGTATCCTC 
GCCTGCAGAC 
AACCTCTTGG 
TGTCCTGCAA 
TGTGCTCTTG 
TAGCTGGAGA 
TGCTTCCCTG 
AGTTTGTTCA 
AGGTCCAGTG 
GCCCTGAAGA 
CCTGGAATTC 
ATAAATCAGA 
AAATCTAAAC 
CCTGCTGGAA 
CACATTCTGG 
AGGAGCTGTT 
ATACGTTGAC 
AGGAAGGCTC 
ACAAAGTGCT 
CTCCATTATC 
AATAACTGGC 
CTCTGGCTTC 
CCACACCTGG 
AGCACCATGA 
CTGCCCTTTT 
TTTTATTATA 
ATATATGTGC 
TCCCTGTGTC 
GGTTTTCTGT 
TTCCACCTGC 
GAGACAAATT 
AGTCAGGACA 
AGTCATTTCA 
GATGTCCGGC 
GTGGGGATCT 
AGGAAAGAGC 
ATTATCTGCT 
ACACAGGGGT 
GTCTGCCCCA 
TAGATCACAA 
AGCCTCACTT 
TTGCCTTCCC 
TGAGTCTCCT 
TCGCTTTTAC 
CAAACAACCT 
AGGCACAGGG 
CAGGATTTGA 
CTTGATAAGA 
CCAATGAAGC 
AATACAGCTT 
GAGGCACCTG 
CATTTACCTA 
GGAGGCTGTT 



GGGGTGGGGC 

TGTAAGTGTA 

CTGGTGGTGG 

TGTCCCTCTC 

ATGCCTCTCT 

CCTTACCCCA 

AGAGAACTGG 

GGGAGCTCGG 

TAAGAGAGAG 

CCTGCACACG 

CTG6AGAGAG 

GCAGCCGCTG 

CAAACAGGAG 

AGCTTGGCCC 

CAGGGTCTCT 

GACTGCCAGC 

CATTTTCATC 

GCTAATCTGA 

CAAGGGTGGT 

TTTTCTGGGG 

TCCTTCCACC 

TAACACTTAT 

ATCACCACCC 

TCAAGQCTGC 

AAAGTGTGCT 

AAGCTTGCTA 

TTTTAACAAG 

CACATTATAT 

CTTTGAGTTC 

CATGGTGGCT 

CATGTGTTCT 

TCTTGTGTTA 

CAACAATTAG 

TCCTTATAGA 

CTTGGATCTA 

CCTTCTTGGG 

CCCAGCAGCA 

GCTGAATGAG 

CCTCCATTTC 

GTGTGTCAGG 

CCTCGCTTGC 

CATCTGTGTG 

GGTGGGGAAA 

GTTAAAGGGA 

TCTCACTTTT 

TTGGTTACCA 

TCCAAGCCAG 

GATGATGAGT 

TTGTTAAGCA 

ATCCAGGTTT 

TAATTGTGGT 

ATATCAGCTT 

TCCGGGCACC 

TAGTTTGGTA 

AGGAGAACGC 

GGCTTCTGAG 



TTACTAGATU^ 

GGTGTTATCT 

GGAGATAATG 

CACCCCCACC 

GATGCCCTCC 

CTGGGCCCAT 

GCCCTCAGTC 

GGGAGGCTGG 

AGAATGAAAC 

CCAGAACCTG 

CTGACATACA 

CCCCTTTCCT 

CAGTTTTCAA 

ATTTACTATG 

GCTGTCAGTG 

TGGAACTTCA 

TGCTGGGCTC 

TGATTATTTT 

CTGATGGTTT 

TGGGGTCTGG 

TTCTGCCCTC 

TACCTGACTT 

TTCTTTTTAC 

ACGGCCQATA 

GTCCATTGTT 

GGACTATAGA 

ATCTCAGGTG 

GGCTCTGTGC 

TGGGGTACAT 

TGCTGCACCC 

CATTGTTCAA 

GTTTCACAAT 

CGAGCTCCAG 

ACTTGGCCAT 

AATCCAGTGC 

TCCCAGGTTC 

TATTCTATGT 

GGAACCAGTA 

CAAAT6AAGA 

GAAGAGTAGG 

CCTTCTCAAT 

TCACTCAGCA 

GGGGTGAGGA 

AAG6GGCAAA 

CTCTGTCCCC 

AGATAAAACC 

TGCATAGTGC 

GGGTGCCATT 

GCTAACCTGA 

GCTCAACTCC 

TGTTACTTGG 

CTAAATCTGG 

ATCTTGAAAT 

AAGCTCCCAG 

GGGTTCAAAG 

AATGAGGGCT 



AAGCTGTGGG 

GATGAAAGCT 

TCAAGCTTCT 

CACTCACCCA 

CAGTAACCCC 

GACCCAGTGG 

AGGTTCTTCT 

GAAACTTTCA 

AATTGAGAAA 

GCCCGCCCAG 

GTCTCTAAGG 

CCTAGGGCCC 

GGAGCCCTTC 

CCAGCTCTGT 

TGCTCAAGGA 

ACTCTGTAAA 

ACGTAGCTGT 

GGCTAGAGTG 

GGATCAAGAC 

GGCAGAAAGC 

TGAGTGCCTT 

GAATCTCCCA 

AGGCAAGAAA 

AGTAGCAAAT 

CAGGTTCTGG 

ATCCCCAGCT 

ATTGGTGTAC 

TCAGTACAGA 

GCGCAGAACA 

ATCAACAGGC 

CTCCCACTTA 

CATTCTCAGA 

ACATTGTGCC 

GCCCTTCATG 

TACCACCTGC 

CTAATCGGTA 

GAACAGGATG 

AATGAGTGAG 

AAAGAAGTAT 

GCCTTCCCAA 

GGTCCACTCA 

ACTTTGGCCA 

ATGACCTAGA 

TAAGATCTGA 

TTCTCCTCTT 

AATCCACATT 

ATTTTGCTCA 

GATACCCCCA 

GGCCACTCGG 

AAAGCCTGTG 

CCAAATAAAA 

CTGAACATTG 

GACTGATTCA 

GTGATTCTGA 

GGACTGGACG 

AATTAACTTT 



AGGTGGTTGC 

GCCCCGGGTG 

CTCTCTCTCC 

CAGACTTCCA 

TGGCAGGCAG 

CTGTGCCTCT 

GCTCCAACCC 

AAGAGCAGTT 

ATGCCCAACC 

AGATVACTGGC 

GGCTGCAGTA 

TTTCCTTCAG 

CCTTATCTCT 

GCAGGGTGCA 

AAGAGGCTTT 

GCAGCACCCT 

GCAACAGCTG 

AGCTCATCCT 

TAGCTGTATC 

AAGAAGGCTG 

GTATGCGCAA 

ATGGCCCTGT 

ACCAAGGCAC 

TTGGACTTCG 

TCTGGTACTG 

GACCCCAAAC 

ACATTACAGT 

TTTAATTTTC 

TGCAGGTTTG 

CCCGGTGTGT 

TGAGTGAGAA 

TTTAGCTTTC 

AGGTGAATGA 

CAGGCAGTGT 

CGGCTGCGAG 

AAACCGGGAG 

AGGTGCCCAG 

TGAACCGATC 

GCTAGTGGAG 

GCTCCCTTAA 

GATGATTTCT 

CCTATCCAGT 

ATCCTGGCCT 

ACATCAAAAA 

GTCTTCCCTG 

AACTATGGCT 

CATTAGATTA 

TTTTATAGCT 

TCACTTCCTT 

TACTAAACGA 

AGCCTATGGA 

GACTCTCCAA 

GCAAATTGGT 

TAATGAGCTT 

GCTCTTCCTT 

GGGGAGCTTC 



40440 

40500 

40560 

40620 

40680 

40740 

40800 

40860 

40920 

40980 

41040 

41100 

41160 

41220 

41280 

41340 

41400 

41460 

41520 

41580 

41640 

41700 

41760 

41820 

41880 

41940 

42000 

42060 

42120 

42180 

42240 

42300 

42360 

42420 

42480 

42540 

42600 

42660 

42720 

42780 

42840 

42900 

42960 

43020 

43080 

43140 

43200 

43260 

43320 

43380 

43440 

43500 

43560 

43620 

43680 

43740 
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CTGCAGTGAC CTTTGCCTTC GGGGAAAGTG TGGGGATTGA GATAAGAGAG AGAAATCCTT 43800 

GGCGGCTAGG AGGAAGGGTA GGGTGTTTGC TGTCAGGCTC CAGGCTTAGC CCTCGTGGTG 43860 

TCCCTCCTGG AGATGGTGTG CACTGAGTGC AGTGGCTGCT GGAGAGTGGG TGGAGAGATG 43920 

AAGGTGATAG GGGTGGGATT AATTAAAATA TCAGGCAGTG TGGCTGGGCG CA6TGGTTCA 43980 

CACCTGTAAT CCCAGCGCTT TAGGAGGCCA AGGCAGGTGG ATCACCTGAG ATTGGGAATT 44040 

CGAGTTTAGC GTGGCCAACA TGACGAAACC CTGTCTCTAC TAAAAAAATG TAAAAATTAG 44100 

CTGGGTGTGG TGGTGCACTG CAATCCCAGC TACTCGGGAG GGTGAGGTAT GAGAATTTCT 44160 

CAAACCCAGG AGGCAGAGGC TGCAAGTGAG CGGAGATCAC ACCACTGCAC TCCGGCTGGG 44220 

ACAACAGAGA GAGACTCTGT CTCAAAGAAA AGAAGCAGTG AACCTTTAGA TTATCCCACT 44280 

CTAAAAGTGA GGCT^CCTTA GTTTTTCTGG GTCTTTA6AA GCAGAAGTGC CCTTGGGTAT 44340 

TTCTAGGCTG AGGGCCCCAC CTAGTTCAAG CCTTCTAAAC ATCCAGTGTT TTGCTATATT 44400 

CATTTACCAC TTGTCCTATT AGACTCTTAG GTCTTTTTTT TTAATGACTC ACTTATTAAA 44460 

GAATGTGCAT TTATTTACAA GGCAATAATA TCACTACCTT TAATGGAAAA TTAGCAACCC 44520 

TGGCTACACC TAGAAGGTAA CTGTTAATAA ATAGGATGAA ACCCAAGGCT GGAATTAACT 44580 

TCTCATTGGA TCCTGCAGCC TATGCTCCTT TCACTGAAGG GTGATATCAG CCAACTGAGA 44640 

CCTCCTCTAA AGTCTGTGAA GGATTGAATT AAGAGAATTG GAAAGGGCAC ACATTTCTCA 44700 

TGATGTGATT CAATATTGAT TAATTCCAGG TTCACCTATT ATCTAAAACC ATGTTACTGA 44760 

AAGTGGCTTA TAAATACCGC AGCACCAGAA TGTAAACTCC ACAAGGGCAG AGTTTTTGGT 44820 

TTTGTTTTGT CTTTTAAAAA TCTGTTCATT GCTTTATTCC TAGACTCTGG AACAGTACTT 44880 

GGAATATAGT AGGTGTTCAC ATATTTATTG ACTACGTGGA CTCTTTTTAG ACTGAGAAGC 44940 

GGAATATAAA GTCAGAGGGT CCGACTGGTG ATCGAATGCC TTCGTTCTGT ACTCAAGCCC 45000 

ACTCACCCAC TTAGTTTTGA GAACTCTGGT GACCCAACCT ACAGCCTGTC CCACCTTCAA 45060 

CTTATTCCCA TTCCTTGGGT GCACGTGTTG CTGTGAGGAT CAGATGAGGT CATGGATGGG 45120 

CAGGACTCTG AACTGCGTGC CCTCTGCACA GGGAAACAGC TGGGCCGATT. ATAAATTGCA 45180 

AAGGGGATGC CTGATGGTGG CCCCATGACT TTTCATATGC TTTGGGCTGT TGTGAGAGAG 45240 

AGTGCCCAAA GCCTGATTCT GGAACATTTT CTTTGCTGTC TTCTAAATGA GAACCTGCTT 45300 

GCTTCAATTC TCCCACTGAG CAATCATGCT GACATGAGGG AGGCGGAGTC AGACCTTACA 45360 

TTGTTGAGAC CAGATTCTGT GTTCTACGAG TATTGGGAAG GGTGATGCAG GCAGGCACCC 45420 

ACCATGTTCC CTGTGAGTGC TTATTTTTAA TAAAAACCTT GGTATACTGC TATTAATGAA 45480 

AATAATAATA ATAATAATAA TTACTCCTGC TAATAATATA AGGAAACACC CACTGGTCTG 45540 

TGACTGAGCC AGCCTTGCCT GAAGGCAGGG GAATGAATTC AATGACCTCT TGACACTGGT 45600 

CTCAGCCCTT TGGTTCTATT ACCACCTTGT AAACCTGAGG TTGTTCTGTT TTTATCCCTA 45660 

GGGAGTTGTG GTTAGAACCT GCCAGAAATT TCTCACTATG AATCAATCTT CCATTGGTCA 45720 

CTGCCCTTTT CAACATGCCT GTCATTCAAG ACTTACGATT TCCTAGGCAT TGACAGAGAG 45780 

AAACTGGCCA TGTGGACCAA GGCAGTGGGA TTTACGTGAC ACCCGCCAAG CCGGTGGGGC 45840 

TAAGTTCCAT TGCTGAAGTC TGATACCTGT CATCTGCTGT GGGGTGACAT CCACACCATG 45900 

TCATTCTCCA TTCGTTCAAT ACATATTTGT GGATTCCTAA AATGCCCCTG CTGCTGTGAT 45960 

AGTCCAGCTC AAGAGAGAGG AAGTACATGA GATGTTACCA CACAGTGTGG TATGTGCTGG 46020 

AGAGGTGAAG ACTCT6GAGC AGAGAGGCAA CAACTCAGGT G6GGACTGAA TGGTGGCGGG 46080 

GTGAGCTCAT CAGGAAAGGC CCCCCCAGGG AAGCTGTGTT TGGGCTGGGG TCTAAGGATG 4614 0 

AGCAGCAGTT AGCCAGGGAA GACAAGGAGT AAATGTACCT AGGCATGTGG GGCAGTCTAT 46200 

GCAATAATGT GGGGAGGAAG CAAAGAGAAA GAGAATGGGA GAATGGCCTG CCTGTTTGGG 46260 

GAAATGAAAG GAGCCAGTAT GTAAAAATCA GGTGAGAGAC AGCTGGAGAT GAGGCTGCAG 46320 

AAATAGGTAG GTGCCAGGTC ACAGAGGGCC TTGTGAATAG TATCATGGAC GCTGGACTTT 46380 

ACTCCAAAGG GCATGGGAGC CATCAAAGGG TGTTGAACAA GGAGATGCAC ATTATAGAAA 46440 

GGCCAGGAAG GCCTCTGGGA TCTCCTCTTC TCCAAACTGT GGCTCTGGGG ACAGCTCCCT 46500 

ATAGTGGTCT TGGGCAGCAC CAAACTGGTG TTTAGGCTCA GCTCACATGC AGCTCACAGC 46560 

AAGATGGTGA CAAATGACTC ATCCTCAAAC AACAGAGCAG GCATAGGAAG GAGGCCCCAG 46620 

TTAGGATCTT GCTTACCTGG TTTGCTGGTG GCCTATGCAT TTAATTGTAG AACAGAATGC 46680 

CAAGCCACTT TTTAACCTTT CTTCTACACC ATGCCCTGCA CCTCCCCTTC TCTCTCTGCT 46740 

CTTCTCCCTT CCACCCTCAA ATTTCTAAGC CATGTCCAGG TCTCGTTTTC ACCTGTGCCA 46800 

GAGAAGATCT ATCTGACTTT GGCCATGGAA GA6GTATAGC AGGTATCAGT TGGAGA6GGC 46860 

TGGAAAAGCT CCCTGGTGCT AGATATGGAC GACCTGAGCT TCCAGTCCTG GCTCTTGCAG 46920 

CCACCAGGCA TTTGACATGG GCAGAAGCAC TTTTCCTCAC TGAGCCTCTG TTTCCTCATC 46980 

TGTAAAATGG GAATCATGGT GATGGTGTGA TATTTGAACA AGTTTTTTTT TTTTTTTCAA 47040 

AATTGCTTTG TAAACTGCAA AGCTCTGAAT AAGTGTTTAT TTGGGATTAT TAGGTUICTGC 47100 
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TTTGCTGGAA CAGTCTACCA GAGGGATGGA AGGAGAGGAA CTGA6AAATC GATTCTTTGA 47160 

AATATTTTTA TCATATGAGA TACAAATATG TATCTATATA AATATAGATA TAAATATGAA 47220 

CAAATATATC TGTCATAAAA TTTAAAAAAG GATGAACCTT GCCCCCAATC TCACCCCTAG 47280 

CAGCAACTAT TAATTTTTTG TTGTATATCT GCCCAGACAC ATATAAAATA TATATTCAAA 47340 

CAAAAAATAT AATCATATTA TAAACTTTGT TTTTTAGCTT GTTTATTCAC ATTACATGGA 47400 

AATCTTTCAG CATCATGGCA TATAGATCTG TCTTTTTAAT ATTACTTCAT GGTCTAGGTG 47460 

AACCATAGTT TATTTAGCAT TTTCCTTTTG GTAAACATTA AAGTTAGTTG CAATTTTTCA 47520 

TCATATATTT TTTCTGGTCT TTTGTACATA TATCTATGAG AGAAATTCCT AGAAATAGGG 47580 

TTGCTGACTC AAAGGATACC AGCATTTTAA ATTTTGGTAG GTACTACCAA ATTGCTCTTC 47640 

ATAAAGAGTG TACAAATACA CCCTCCCACA AACAGAGTGC CTGCCTTCCA TGCCTGGACC 47700 

AACCACAGGC ATTACCACCT CTGCTGAAGC TTTTTCATGA GACAAGGTCT TGCTCTGTTG 47760 

CCCAGGCTGG AGTGCAGTGG CGTGATCTCT GCTCACTTCA ACCTCTGCCT CCCAGGTTCA 47820 

AGTGACTGTC ATGTCTCAGC CTCTGGAGTA GCTGGGACTA CAGGTGCGTG CCACCAAACC 47880 

TGGCTAATTT TTGTATTTTT GGTAGAGATG GGGTTTTGCC ATGTTGGCCA GGCTGGTCTC 47940 

GAACTCCTGG CCTCAAGTGA TTTGACTGCC TTGGCCTCCC AAAGTGCTGG AATTACAGGC 48000 

GTGAGCCACC ATGTCTGGAC TGCTGAGGTT XTTTTTTTTT TTTTGAGACC AAGTTTCACT 48060 

CTTGTAGCCC AGGCTGCAGT GCAATGGCAT GATCTTGGCT CACTGCAACC TCCGCCTCCC 48120 

AGGTTCAAGG GATTCTCCTG CCTCAGCCTT CCAAGTAGCT GGGATTATAG GCATGTGCCA 48180 

CCATGCCCAG CTAATTTTGT ATTTTTAGTA GAGATGGGGT TTCTTCATGT TGGTTAGGCT 48240 

GGTCTCGAAC TCCCAACCTC AGGTGATCTG CCC6CCTTGG CCTCTCAAAG TGCTGGGATT 48300 

ACAGGCATGA ACCACTGCGC CCAGCCTTGC TGAGGCTTTT AAAACCATGA AACGCTCCTC .48360 

CTCCCTCAAA TGGTCATGTG GCCACTGCCT GCTTCATCAC ACTGCTCCTC TGTCTGACAA 48420 

GCCTGTTCTT ATATAACACC AGTAGGTAGG GCCATCCGAG ACATGGTTAT CCAATAAAAT 48480 

GGTAAGAACC AGCCCTAGGG TATTTGGGAA ACTGGCTGTG AGGGTTCAAT GGAATATTCA 48540 

CATTTCCAAA CATAAAATCT AGCAGCAATG GAGAAACGTA CTTTAAGCAG AGAGTTTTGC 48600 

GCCTGACACA AGAAATTATT ATTATTGTTG TTATTGAAAG TTCTGACACA CAGATCTCGG 48660 

TTGTGTTTGG AAGGAGGATA GTCAGAGAGA GGAGGAAGGT ATGAAGAGGT CGAGGTGTTA 48720 

GTTTTAAAAA GTGTGTCTTT GTCATTGTCG AGCTGTGGCT GGTCCCACAA CCTGGTTCTA 48780 

TCAGGCCTTT GGTGTTACAA AATGCAAAAC ACCAGGCAAC CAAATAGCGT TTCCATGGAA 48840 

GTATCCCATG ACCTCTGGTG CTGTGTACAG GTGAGACAGT GAGCACTCAG AAAGGGATGG 48900 

CCTGGGTGGG GAGGGCGAAA GGGGCCTCTC CAGCCTCTGC AACATAAAAC AAGGGGCCAA 48960 

TGGAAAGTTC TGGAACTGGA TCACTAAGAA GACAGGCCCC ACTGCTGGCA TGAGTGGGAT 49020 

GACCAAAGAA TTAGGAAACT GAGATTGGAG TTGGTCACCA ATTCAACTGG CCCATTTAAA 49080 

AATTTTCATA AGCAGGGACA GAGGATCAAG CCAAGAGCAC TAG6GAGATG GTGATGAATG 49140 

GAAATTGTGT AAGGTAGATG GCTATGTGCC GGGGAAGGAG GAGAGAGGAT TCAGAATTAT 49200 

AGGAATAATA CATGAAATGA CTGACAAAAG TAGCCTTTTA TGTGTGTTAT GTAATTTAAT 49260 

CCTCTTAACC TTATAGAGTT AGCACTGTCA GGATCCACAT TAAAAAAAAA AAAGACGAAG 49320 

CAGAAGCTCG GAGAAGTCAA ATTACTTGGC CAAGGTCAAG GTCACACAGC CACATGTGGC 49380 

AAATCTGGAA TACAAACTTA GGTCTATCTG ACTTTAAACC AAAATGCTGC ATATAGCTTC 49440 

GATTTCAGCA CAGCAGGGTT CAACTTGGAG ATAGAGGGTG GTGTTATAGA TTACCAGATA 49500 

CGATAGTGGT AGGTTTTCTT CTGTCTTGAT GAAAGATGAG CTATTTTTAT CCTGTTGCAG 49560 

GACAACGCGA AGGATCATGA CTTCCATTTT TGAACTGACA TTGTAGATTT GTGTATATTT 49620 

GACAGCTCTA CCACATTCCC AACCCTATGC CCTCCTATCA CTCTTTTTGA 6AATACTGGG 49680 

CTAGTTGGGG GCAGTGTTGG GGGACTTGGG CCTGGGCGTA TGCTGGGAGG AAAGGCAAGG 49740 

AGATTATGCA GTGTGGTAGT AGGGACTGGG GGAAGTTTTT TTGTTTTTTG TTTTTGTTTT 49800 

TAAAATCCtA GTTGGTCCCC AGTGGAGCCT CCAGACCTCC TCAAAGTCTT TGAGGTTGTG 49860 

ATTAATTACC ATATAAACTA GACAGTCCTT GGCCTTGGTG TTGCCATTCC AGCCTGTAAT 49920 

TATCTTCATC ACAAGTTGCT GTCTGGCTTT GTTCTGTAGG TAGAGGCTCT TTCGTAGGTC 49980 

CCTGCATGTC CCTGAGTCAC TAGCAGGCTC ACTTGTGCTT ATCCAAACTG GTGAATCATT 50040 

AGCTGTCACC CTGGAGAGCA GTGCAGTTTG GGAAGGCGTG GGTGCGCCCA TGGAGAGGGT 50100 

GATCCCCTCT CTCTTCTTTC CAGGCATGCG TAAGGAGCAG TGGCAGAGAA TTACGGAACA 50160 

GAGGATGCTA TCATAGGTGA CCTATGAGCC AGGCACGTAC ATACGTGTCA TCTCAATGAA 50220 

AGCTTTACAG CACAGGTTAT ACAAGTAGTA CACAGGGATA AACAQCAAGG TTCTTAGGTG 50280 

GGTTTCAGAC CTGGCTCTGT CATTTATCTA GAGGTATGAC CTTGGCCCAA CCTTCCTAAC 50340 

TTGTCTATGC CTTGATTTCC TCAACTATAA AATAGAGATA AAAATGGTAA CTGCATCCAA 50400 

GAGCTTTTGA GAGGAATTGA TGCAAAGATG CAAGTACAGT GCCTAGCAAA CTGAAGCACT 50460 
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CCATGAGGAG 
ACAGATGGGA 
TGGGATGAAA 
CATAAATTAG 
AGGTTGCTTC 
ATGTTAATAT 
CAGCAAGCAA 
TGCCTAGTAA 
GGGGCACAGG 
TGAGATCAGG 
CTTAATGTTT 
TGGGGCCTTT 
AAAATAATAA 
TTGCTTGTTT 
CATTGGCATG 
GGTTCTTTGA 
AATCACAGCA 
CCTCACAGCA 
AGAGATGGTA 
AATCTATGTG 
TTCTAGGATG 
TATATGCGGC 
6AGAGAAGAG 
AAGATGTCTT 
TCCTTGGAGG 
TGCCAGGGTT 
GAATAGGAAA 
TGTTTTCTTT 
GAAACATTGG 
GATTTAGATT 
GGTTTGGATA 
TGACATGGGC 
TGAGGACTGG 
GGGAGTTTGG 
CCATTGTCAC 
CCTTCAAGTG 
TAAAATAAAA 
CTTCCCTCTG 
AAATGATTCA 
TGTGCATAAA 
TTGGGGGGCA 
CTCTCTCTCT 
TCTGCCTTCT 
AAAGAGTAAA 
CTGGGTGCCT 
ACCTGGTTGT 
ATCCCCTAAT 
CCAAACTGGA 
ACCACAACCA 
TGGTTCTGGC 
AAGACTTTAA 
TCTCTCCCAC 
TCCAGCATAA 
GGTTGGTCTG 
TGCAGATTTT 
CTACATAATA 



TGGTGATGCG 
GAACTGAGCC 
ACCCCAGTCT 
ACTGTT6GAC 
TATGTTTTAT 
TTACTCAGGG 
GGGCTCGCCC 
AATGCCAGTT 
6AGAGGATGT 
CAGACATGGT 
CTCAGCTTCA 
GTGAGGATTA 
GGCAGAGCTG 
CCTGCCATTC 
TCAGCCTCCA 
TTCAGCACAG 
ACCATTCATG 
GCCTTGGGGG 
GGTAATTGCC 
TGTGCATCTA 
TGGGGCATTC 
CTCTCCGTGG 
GAGGACGCAG 
TTCAATTCAA 
GAGTCACGTC 
CGCCAGGCTG 
CATGCTGTCA 
TTTACTTCTA 
TGGAATGTAG 
CTGATGAAGG 
CCACGTGTTG 
CCACTGCACC 
ACATGTCGGT 
ACTGCCTGTC 
CATATTTGCC 
ACTTACTTTT 
ATTACGATTT 
AGGACCT^CAG 
GTTGCTTTTT 
ACATTGGCGC 
TTTCAAAGTC 
CTCTCTGTCT 
CCCCCATGCT 
CAGGTCTACT 
GGTAATGATG 
GCAAAGAATG 
AAGCATATTA 
AAGAAAATGG 
TAGACAATTG 
AACTCCTGGA 
AGTATTTTTG 
TCCCCTTGTC 
CCAAGCTTAA 
TTTCACAGCA 
TGGCTCAGTT 
CTTTTTGGTT 



GATGCTAATG 
TCAAGTTGTT 
GTTTCCAAGT 
TGAAAAACAA 
CCTCAATCTG 
TTCTGGGAAA 
GCTAATTCCC 
TCCTTCTATG 
TTCTTGTGGA 
TTCTATCTGT 
ATGTCCTCAT 
AATGAGATCT 
GGTAGATTGA 
AGAGACCCTG 
AAAGAGAGAT 
TGCTCTCTTT 
GTGCAAGGTG 
ATGGTATTAT 
CAAGGACACA 
GATTGACCAA 
AGTGCTAAAA 
AGGGTTGGTT 
AGAGAATGGC 
CCTGCTTGTT 
AGGACTTTCC 
GGTGTGGAAA 
TCTTGCTTCA 
TAGCTGAAGC 
ACTATTGAAT 
CAATCTGGGG 
GGATCAGGAA 
TGGGGTGGAC 
GGAGACCAGG 
GTTAAGCCAT 
ATGTCTGTGT 
TAACTTTACA 
GAACTTAGAA 
CTCCAGCAAC 
GGGAAGACTA 
AATTGTGACT 
TCTCAATGGT 
TTTCTCTTCT 
TGCTGCCAAC 
CTGAGTAAGG 
CCTGCACTGG 
TTACCCCCAG 
TTCCCAGCTG 
ATTTGCAAAA 
TGAGATCTAA 
GAGCCACAGA 
GCATTAATTG 
AGAGATGTCT 
TAGCTTCCAT 
GGGCTATCCT 
TGCAGAAGAC 
CTTAGAATTT 



CTGATGCTGG 
TAAAGTGGCA 
CAGGAACCCT 
TCCGTTCAAA 
AAGCAATATA 
CCCAGAAGGG 
CTTTCTTCCA 
TGGAAGGGAG 
TAGGTAGGTG 
CCTCCCAGCA 
CTTAAGATGA 
TAGTATCTGG 
GGGTTTGGTT 
GCCAAACTAT 
GACTGCTCAG 
TTGCACTGCT 
ATCTCCCTAA 
TTCCATCTGT 
TAGCTGTTGG 
CCTTCCTGGT 
TCAGGAAAGT 
GGTGGGCCTG 
AGAAGCAACT 
CAGTTCAACA 
GGGTATTTGA 
AATGGTGCCC 
TCTGAATCTC 
CCCAGTGCCC 
AATTCCAAGT 
AAGACTGAAT 
GCAGAGGAGC 
CCTGTGAGGT 
TGGTGGAGGA 
TTTTTTCTCC 
ACCTACCTAT 
TTTGTTTTCA 
TCATCTTGCC 
TGGGGAACCG 
CACACGTGAG 
AACATGGTAA 
CATCCGGATG 
TGGTCTCACT 
AGCTCTGAGG 
CT6TGGGCTG 
CATGATGCTG 
CCAAGGCTCA 
GGCATTGAAC 
ATCAGATGTT 
AGTTGGACTC 
CTGATGAATT 
ACAAAGTCCA 
CTTTCCCCTT 
GTTTCCACTG 
CACACGAAAA 
TTCCTTATTT 
CAGAGCTATT 



GACAAACTTA 
TAGCTAGTAA 
TTCCTCCATA 
CCACAAGGGT 
ATGAGCAATG 
TTTCAGGGTA 
AGACTGATCA 
CAAAGCTGTC 
GTGCTTAGGG 
GTGTGTCCTT 
GGGATTATCA 
CACATAGTAA 
TACAGCACTT 
GTCCATTGTG 
CAGGCATTAA 
CTCAGTCTAC 
ACTTACATTA 
AGATGAGACA 
AGAAAGTAGT 
TTGCCTGGGA 
CTAAGATGAG 
GAAAAGGGAT 
CTGCACTGTT 
AGCAGGTTTG 
CCGTGATGAA 
CAAACCAGCC 
CATTCCATGA 
AGAATATGGC 
ACAAACCAAT 
GGAGAAATAG 
ACAGAATGCT 
AGAGTTGGAG 
TGGAGAATGC 
AAATTTCAAT 
ATTACTTATT 
TATCAAACAC 
TACCACATGA 
ACAAAGATTT 
GAAGTACTGA 
GAAATATTAT 
AAATATGCAA 
TTGCCCTCTT 
AATGGGAGGG 
TGCAGTGACC 
TGGCTTTCCA 
AGTTCACAGA 
TTCCAAGTTA 
TGCCAACAGC 
CCTGAGGTTT 
TGAGGATCAT 
CAGCAAGCCA 
QCTCTTCTTA 
TAAGGAAGTG 
GTTTTCAGAT 
CAGTTTTACT 
AACCTCTAAA 



CACCCACTTT 

GTGGTAGACT 

ATGCCGTCTG 

ACATTGGCGC 

TAATGAGATT 

AACCATCTCC 

GATTGCCCAG 

AGCTCCTGCT 

GTAGAGGCTC 

GGGTAAGTTA 

TGCTACTTTG 

GTGCTTAATA 

TGACAGCAAG 

GCCACAAGAC 

CCAGATCAGA 

CAACAGTATC 

TATCTTTAAT 

ATAGGGGCTC 

ATTGGAGCAA 

ATATGGGGTT 

TTGGTTACTC 

AGGGATAAGA 

TCTTTCTGCA 

AAT6CCCTCG 

GAGCGCTGTC 

CCACATGGCA 

GGGCAGGAAT 

AGAAACTCCA 

GGTCCAGGGA 

CATTGGAAAC 

TGTGCAGAAG 

ACCAAGGGCC 

CATGCCCTCA 

CCCCCTCATT 

TAACACTTTT 

ACATGGCTGT 

GGTAGGTGTA 

TTGAAAGAAG 

GTGGAAGATA 

CAACGCAAGT 

GAACTGCTCT 

TCCCAGCAGC 

ATTGCAGTTC 

CCCAGTGGGT 

GGCTTGTTTT 

CCATTGGCCC 

AGGTGACCTG 

ACCATCCCCC 

TCTGCCCTGG 

AAACCTTAAG 

GGCATGCTCT 

CCCCATTCTT 

AGCCGAGTGT 

GCATTGACTA 

GTACACCCAC 

CTTAAATCAA 



50520 

50580 

50640 

50700 

50760 

50820 

50880 

50940 

51000 

51060 

51120 

51180 

51240 

51300 

51360 

51420 

51480 

51540 

51600 

51660 

51720 

51780 

51840 

51900 

51960 

52020 

52080 

52140 

52200 

52260 

52320 

52380 

52440 

52500 

52560 

52620 

52680 

52740 

52800 

52860 

52920 

52980 

53040 

53100 

53160 

53220 

53280 

53340 

53400 

53460 

53520 

53580 

53640 

53700 

53760 

53820 
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AATTCTCATC AAACTTTCCT AGGGCCTTGT CATAAAAGT^ ACTAAGTCTC AAAATAGGAC 53880 

TTTTTGGCCT AATTTCCTTG TCCAGGAAGA CAGATTGACT AATTCCAAAC CCTGGACTCA 53940 

CATGTGATTG CTAAGAATAG GGGTGGGGGA GAGAGAGGGG ACAAAAGTCA TAGACATGCC 54 000 

ATGACACATA TTGGGAGATT TCATCTGAAT TTCCCCTGAG TATGAAATTA TTCAGAAATA 54060 

ATTCCAAGGG CTTCTTTTCT GACATTCCAC CAGTGTGCAG GTGCATATGT TTTGAATGAA 54120 

CTGAATGGAT AATTTATTTA TACAAATGAG TCTTTTTGAA TAGTTGCAAT GGATGTGCTG 54180 

TCAACTCTCC AATATCACTT CCAGGGGGTT TGTAGATGCA TTCTTTCCAT GGGCATCAGC 54240 

AGGTCTGGAT CTCCTGCTTT CTATCTGAAA GGCACTGGTC TGGATCTCCT GCTTTCTATC 54300 

TGAAAGGCAT TATGGGCAGC AGTCTGGTTG ATTTTATACT ACTTTGATAC ACTCTCAATT 54360 

GCATACTAAG AATGAGGATG GAGAAACTGA TAGTQACCCT CACCCCAATT AGGTTTCACT 54420 

ACTGCCCTTG ACCTTCATAT TTAATGCCTT TGTTATCACA GCAACTCTTT GCTCTATTTC 54480 

TGGATCCAAA TGCCTAAGGA TCTCCCTGGG GTGTTAAGCT TGCTCAGTGC TATTT^CCT 54540 

GGTGGGTGGC AGAGTGACCT TTGTATCACA AGAGCCTCAT GACTTCCCAG GCAAGACCAA 54600 

GTCACAACTT TTCCAATGGA TTTCCCCTCG ATTCTTATTC TGAGCATTTA GCTTTTTAAA 54660 

TATTTGGCTC TGAAGGGCAG GGGCTAAACA TTGTTCTGTA AGATCCAAAC CTGCTTGTAT 54720 

ATTTTATACT TTTGTTTTTT CATTTCAACT TTCCGATCTC GCTCTTCTGA GAAACATTCA 54780 

CATTTCCAAT TGCATTCCAG AACTGAGCTT GACTTTCCAT GTCCATGTAA GATCTTGTAA 54840 

TTCAAATTTC AGCCAGCTGC TAAGCTTCTC TTTTCTGGAG GGGATTGTGG TTAAGAGATC 54900 

TTGTTTTGCA ATGACGCTGT CTGGTCTGAG CTCCAGCTAC TTGTCTTATT TACTGTGCAA 54960 

CCTTGGCCAT GTTAACTTAT CAGGCTCATG AGGCTCGGTT TTCTCATCTA TAAAGTGAGA 55020 

AAATGAATAG TACCTATCTG ATGGAGTTTT TCTAAGGCTT AAATQAAGTA ATGCAAATTA 55080 

AATTCTTAGT CTAGTCACTG GGAAAAGATG AAAACTTAAC GAATATGAAT AGTCACTATT 55140 

CTGTTTCTTT TTTTCTATGC CATTCCGGCT TCACCTCCTT CTCTTACTTT TTCCCTTTCT 55200 

TTTTTCATTT GTTTTCTTTT TTTTTTTTTT CTTTTTTGAG ATGGAGTTTC GCTCTTGTTG 55260 

CCCAGACTGG AGTACAATGG CATGATCTTG GCTCGCTGCA ACCTCCACCT CCCAGGTTCA 55320 

AGCGATTCTC CTGCCTCAGC CTCTCGAGTA CCTGGGATTA CAGGTGCCCA CCACCATGCC 55380 

TGGCTAATTT TTGTATTTTT AGTAGAGATG GGGTTTCACC ATGTTGGCCA GGCTGGTCTT 55440 

GAACTTCTGA CCTCAGGTGA TCCACCCACC TCAGCCTCCC AAAGTGCTGG GATAACAGGA 55500 

TGCCGCACCA CTGTGCCTGG CCTCTTCTAC TTTTTCTTAG AAACATGGAG GGTTAGTTCT 55560 

CTGGCCACTC ATATGAAACT TCATTCCCT6 CTAAGGTGGA AGTATTGGAG TTCAAGCTCT 55620 

ACACTTAGTG GAGGGAGTAA ATAAGCATTT CCAGAGAGCC CACCAAGTGC CATGCAATCT 55680 

CCTAATGCTT TGTACTATTT CTCATTTAAC CCCCCAAACA GCTCACTGAG TATGTTAATA 55740 

TCCCCAATAA ACAGATAGGG AAACTGAGAC CTAAAGTTTG AGCAAATATG GCAAAGTTTT 55800 

CCTAGGCTGT CTGGCTTTAA AAACAATGTC CTTTCACCGC ATCAGGCTGC TTCTGAGGAG 55860 

CAGAGCCACC TTGCTTTTGT AAGTCTGTTG GAATAGGCTC TGAQATGCCA CACGTTATCC 55920 

CAAATAATTA GGCATCTGGA TGGAGATTTT ATACATTTTC TACTTGGACC TGAGTTTGCT 55980 

GTCTCTCATG GTTCCTGGGT GAAAGAGGCC AGGCCCTGAG ACCTTTACCC AAGGTTGGCT 56040 

CTACCAAAAT ATCTTCTTGA GTGAGTTCTC TGGTTGATCA TCTGTGGAAC AATGTGGGAG 56100 

CCTACTAAAT ATGAATGGAA AATGAGGAAT GCAAAATGGA TGGTTTTCTC CACTATCACC 56160 

TCACCCTTGG AGGTGTTTGC TGATTTGGTA GATGTGTGGA GGAACTCAGG AGTCTGAATT 56220 

TGTAAAGGTA ATTTGGATGC TTCATTAGCT TAGAAAGGAC ACAGCAGGGA GAACTATATA 56280 

GCAGAGAAGG CTGGATGCCT ATGAGGGTAG GGAAGGGAAA AC7VAGGGGGT GGGGCTGTAG 56340 

CTGCCCTACC TCCGGTCCAT ATATGGCTGC ATTTCTTTAA TCTCTTTTAC TTTTGGGATT 564 00 

CCATGGTAGT AAACAAAGAG TTCTTATGTT AAAACAATTG CTATCTAATT GTACAGCATG 56460 

GTGAATATAG TCAATAACAA TGTATCATGT ATTTGCAAAT TGCTAAGAGA GTAGATTGTG 56520 

TTTTCACCAC ACACAAAAAT GGCAAGTATG TGAGGTAATG CCCATGTTAA TTAGCTCAAT 56580 

TTAGCCACTC CACAATGTGT GTGTGTGTGT GTGTGTGTGT GTATATATAT ATATATGTTT 56640 

ATGTATATAT ACACACACAC ATATATATGA CATGTCAGAA TGTCATGTTT TATTCCATAA 56700 

ATATATACAA ATTTTATTTG TCAATATAAA AAGAATAATA CCTGGAAAAA CAAAAAAAAA 56760 

ATCCTAAGTG CTATACTTAT AAAGAAATCT TCCTCATACA AAAAAGAAGA AATTCTGGCC 56820 

ACAGGAAGGT TGCCTGAAAA TGGCCACCTT TTTCATGATT TTCCCTCCCT TTCTGAGACT 56880 

GAGAAATGAG CCTTCTTGAA GACCCTGATG GAAATACTGT GAAGAAACTA AGACAGTTGG 56940 

ATTCAAGAAC CAAAATGCTT ATCGTAGCAG TGAGGTTGGC TTGAAGTCAG GGAACAGTGT 57000 

AAAGCTATTT GTGGGGAAAG ATAAGGCCAG AAAGAGATTG ATAAAATACA GGCGAGACCA 57060 

AAGGAACAGG GCAGGGGCAA ATTAGTTTAG GCAAGAATAG AGGCGTCTTG ATATTAATTA 57120 

AAATATGGAG GAGGAGTCCA GAAAATTCAT CCTTGGTGCT TGGGTAAGTT TAGCAACATG 57180 
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TTCAGATGCC 
TGGGTCATTC 
CCACCTTGGC 
ACA6ACATAA 
TCAACGTGTG 
CTTAGAGGAT 
AATGTGACCT 
CTTATTTGAC 
CCCATAAGTT 
CTTGTACGGC 
AGGAGGAGAT 
GGCCCTCTAG 
GCAC7VCAGCT 
TTAACGCTTG 
ATCCCTAGGG 
GGGTCTCCAG 
GGGGACCAGC 
ATAGGGCCTT 
AGAGTTTGTT 
ATCCACAGAA 
TCAGCCCCCC 
CCTCCTTGAG 
ATTACACTTT 
GAAGOCCTTG 
TATCTATCTA 
GTCTTGTTGA 
ATGCAATATA 
GTATTGTTTT 
GAATCAATAC 
AAAAAAACCT 
CCACCATCAC 
TACAA6TGGA 
TATAATTAAA 
AAGTTCACTA 
GTTAATCTCA 
CTAGCAATAA 
CTACACACAA 
CTCTCAAGCA 
ATAGACTAAT 
GTGTATATGG 
TTGAACAGTT 
AGAACTCCCC 
TCCAAGACCT 
CTCTGAATTT 
TCACTGTTCC 
AAGCAGACCT 
TTTATCTCTG 
CAAAGAAATC 
AGAGAGTTAA 
GGGAGAAGGT 
GGCCCAAGGG 
CCTTCTTCTT 
CTCTAACTCT 
GGAAGTGAAG 
TGACTGCTTG 
AAACCTCTCA 



TGAGTTTTGT 
TTCTCAGGAA 
ACGCCAGTGG 
TGTCAGGAGG 
ACCTCCCAAA 
AATGAAGATC 
TTAGAGGGGA 
TGTAACCAGT 
TCACAAGACC 
TTTCACTAAA 
GATAGACAGA 
TTAAAAGGTA 
TCCATTAATC 
ATACTTGGCT 
TTTTTGGTTT 
GCCACCACAC 
ACAGGACTCA 
GACCTGTAGG 
TCAGTTGATT 
TAAAGACAAT 
AGAGAATTCA 
AACTTACAAT 
AAACTTCTTA 
AGACATCCAT 
TCTATCTATC 
AGTTGGCAGT 
CCTGCAATTC 
CATTTTTGTT 
TTAATAAGTA 
CCACAAATAA 
TACCCCAATA 
TAAAGAAATG 
GGTGATTGGG 
TCTCTTGACC 
GGCAATGGTG 
TGCTTCCCTT 
ATGCCCTGCA 
AAACTCCAGC 
TGGGGCTAAT 
GGCGGGGTTT 
GGGAAGGGTG 
TAAACTGCAC 
TCCCAAGTCC 
GTGTAGAAAC 
CCCTTAATAT 
TTTTTTTTTC 
CCAAATCTCA 
TGCTTAGACA 
TGGTTGGGGG 
ACATTTGTGA 
AATAGAGGTT 
GTCCAAACTG 
TATAGGGAGT 
AACCTAGATA 
ACCTAAGCAT 
AGGCTCTTAC 



GTGTGTATGT 
GAGTGAGCCA 
CACATGCCAT 
CATTGCTGGT 
CAGAACTCAG 
CCCAGGAACC 
GTAGCATTAA 
ATAATAAATG 
AGAAGGAGTT 
TAGATGCCGT 
AATATATACA 
TTGTGTAAAG 
TTGTAGCCAC 
TAAAGAGATT 
TACAGTATGG 
ACAACCTGGT 
ACTCAAGGGC 
GACAACCAGG 
TGAGGGCAAG 
CAGCTTTGTT 
GGAGCACAGA 
GTGTCCATAT 
TCCATAAAGG 
CTCATCCCAT 
TATCTATCTA 
AGGGTGAAAG 
TAGCTTTTTT 
TTTCTTCTGG 
CCCAGCAAGT 
AGTGGCTTGT 
TGACTCGTCT 
GAGATTTAAC 
ACTGGGTCAG 
ACACAACCCT 
TAAAGAAGGC 
TTCTCATGAG 
GTGGGTGGAA 
AGGCCATTCC 
GTGATAATGG 
ACATTTGCAT 
GGAATGTCTG 
TGACCAAAGC 
ACCCTTGTTT 
TAGAAAAAAT 
AATAACCAGT 
TTTTTAACAC 
ACTTCCCTTC 
CTTTGCTCAT 
ATGGTATTTT 
CCCAGTGAAG 
GCCTGGGTAT 
TGCTTTTTTA 
GTCAATAAAC 
ATCCACCAAC 
CTCCTCATAA 
CTGGGGCCAG 



GTGTGGGCAT 
CTCTCCCCTC 
TGGGCCAGGA 
GTGTGTCACA 
GTGCCCCCTT 
TCATCTAATC 
GAAGCAAAAT 
ATCATATTCT 
TCTTCTTCCC 
GTTCTGCCCT 
AACACACACT 
TGTGTGAGCA 
AGCCTGTGTT 
CTTTGTCCTG 
ATCTTCTAGG 
TTGCTTTGCT 
TCTGTGTCTG 
AAGATTTCTA 
CTGCTTGGCC 
TATCACTCTG 
ACAAGTGCTG 
TAAGGATCTG 
ACATACTTGA 
CATTATCTAT 
TCTATCATCT 
ACCTCAAACT 
GTGTTTTTTT 
AAGGTTCAAC 
AGCAGGCACA 
GAGTATGAGG 
CAATAGCCCT 
CAGAATTCTT 
AGAGCCACAT 
AGCCCTTCTA 
CAAGTTTGTT 
TGCCCCGCCA 
TGTAGTTACT 
CTCAGGGCCC 
GAAATAATGA 
TTTCACAGGG 
GGGCAGGTTA 
CTCAAGCCCT 
TCCCACTGAG 
AA6TAAGAAA 
TTTTATTCTA 
AAGTAACTTC 
CCTCTCCCAC 
GCCAGGCCAG- 
TCTTTGCTAG 
CAGGTACAGG 
TTGAATCCGT 
TTTCCAGTTT 
CTTTTAAAAA 
CGGATAATCA 
GGTACCCTCC 
GGGAGATAGG 



GCACGTGTGT 
CTCCAGCACC 
TTTGCTCAGA 
TCAACCTGTT 
CAGAGACCGT 
CAAAACCAAA 
GATACTTATT 
GCTCGATTTA 
ATTGGTCTTA 
GGAGGTAACA 
TGCTTTCAAA 
TCCTCTTTCT 
GGTGTTAAGA 
GCCTTGATTT 
AGACAACCCG 
CTGTTCCCCT 
TGCACAGGTT 
TGCAGAGTAA 

TTCATTTTGC 
GAGGTCTCTC 
CTGTGTTTGA 
TATATCTQAG 
CTATCATCTA 
ATCTATCTAT 
CCAAAGGACT 
TTTTAGGTTG 
TAAGACCCAA 
CTTTTAGGTA 
TGACATCTTT 
CCAATCTAAA 
CAGCTATAAA 
CACTTTTGTG 
CTCCCACCCT 
TCCCTGGAGT 
CCCACCCCCC 
TCAGGTTGTG 
TGCTCTCAGA 
AATTTGTTGT 
CCCTTGGCAA 
GGGAGGCAGA 
TCTTCAAGAC 
TCTTTTACAC 
AGACTAATAC 
TTCAGTCAGC 
TTGGTTTTGA 
AAAAGGGAGG 
TGTCCTGGAA 
GAGCAGTCAT 
TAACTCCCCA 
AGATCCTCCC 
CAGCATTTTG 
AGATCATGTA 
GCTCTTGCAT 
CTCCCAGGAC 
CTTTTCA7VAG 



GTGTACACAG 

AAAGTGGCCC 

ATGCAGGCAC 

A6AACAACTG 

AAAGCTTGTC 

AGATTTGGGA 

AATTCTGTTG 

ATTCCCCCTC 

CATTAATATT 

CCACGTCATT 

AATAAATATA 

TGCAAAGCAA 

CTCAGATTCC 

GGGAATTAAG 

ACTGACCTCC 

TTTCCTCTGT 

GGAGAGGGTG 

TTGGGTTTCT 

GATTCTTCCC 

TATGTCTTTA 

TTGCCAGAGT 

TGATTTTGTG 

ACTTGTAGTA 

TCTATCTATC 

CGCCAGTACT 

TTCCGTATGG 

GGGGTGAGGG 

GTAAAAAGAA 

CTTTATTTAC 

CCCTCCCCTC 

ATGGACTAAA 

TTACAGGGCC 

GTTGCATTTG 

GCTGTCTCAG 

CCCACGGGCT 

TTCACCATCA 

CCTGATTTGT 

TCTGGAACTG 

TTTTATCAGT 

GTTCACAGGG 

GGGATTTATT 

CTGCCCAGCT 

TTTCAGAAAC 

TACTGCACAC 

CTTTGACCAT 

TCACAAAATC 

CCCGTTGAGT 

GGTTCAACAG 

TCACCCGTAT 

TATGTCCCTT 

TAATATTCCA 

GTCTTCTCAT 

AGTGTCAAGA 

ATTTGAGAGT 

CTTCCCTTTC 

TCCATTGAAT 



57240 

57300 

57360 

57420 

57480 

57540 

57600 

57660 

57720 

57780 

57840 

57900 

57960 

58020 

58080 

58140 

58200 

58260 

58320 

58380 

58440 

58500 

58560 

56620 

58680 

58740 

58800 

58860 

58920 

58980 

59040 

59100 

59160 

59220 

59280 

59340 

59400 

59460 

59520 

59580 

59640 

59700 

59760 

59820 

59880 

59940 

60000 

60060 

60120 

60180 

60240 

60300 

60360 

60420 

60480 

60540 
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TGCCAAGAGT 
CCCTTCAGAG 
ACTAAATACA 
CCTCCAGATT 
TGGTGAGCAG 
AAATCAATTC 
GGTTAGACCT 
GTTTGTAATC 
GTGACTTAGG 
ATTATTGTTA 
GTGCCCCATC 
CAGCTATTGG 
ACTTCTTTTG 
ACAGATGAAA 
TGGGCTTACC 
GATATTATCT 
CCCTCGTTTC 
AAATAGGCAG 
TAGCTGTTAG 
TTTCTGACTT 
AAATGGCTAG 
AAGGCAGGCA 
CAAA/^TCAA 
TTACAAACTA 
AAAGGGCTTC 
ACTATGATCA 
CTTTTGATTT 
ACAGTGGGAA 
CACCGTCCAT 
ATGGAGAGTG 
ACCTTGGTAA 
ATTCAGGAGT 
AAAAGGGCAA 
ATTTATTTCT 
TACTATCAGA 
TTGCAGAGGA 
CACCAATTCC 
GCTAATGGTA 
TTGAACTTGA 
TTTACAGAGG 
TACATGACTG 
GACGTGTTTG 
TTTTTTTGGT 
GAGATGAAGT 
CTCCGCCTCC 
TGTGCTCGCC 
GTCAGCCAGG 
AGTGCTGGGA 
ATTAGTCTAT 
CCTACATTCC 
CAACTTTACC 
GGA7VACTGAG 
AGGAGCTCCT 
AGGAAGTAGA 
ATGCCCCCGA 
TAGACCATCT 



CTCTGTCAAG 
CCTGGTACTT 
GACTAATAAA 
TGGACTTCTA 
GGTGCTTTTC 
CTCATCCATT 
GTTTTGTTTG 
CTTTATTTTA 
GAATTCCCGG 
TATGCATTTA 
CCTATAGGAG 
ATTTCCCACC 
AATCTGGAAT 
CTAGAACAGA 
TCACTGGGTC 
GTGAATTTCC 
CCTATGCCTG 
CTAGAGATAG 
GGCTCAGAGT 
ATGAGCTACA 
AAATAGGCAA 
GAGGTATTGG 
TTTTGGCAAA 
ACCCTTTGTT 
ACTGTGTAAC 
GACAGGCAGT 
TGCTTGGACA 
ATTAACTTCT 
CTTTGCTGAT 
TATGCTTCTG 
CAGGGAGAAA 
TGGTTCTGCA 
TTAGAGATTC 
AACAGAAAAG 
AAAGGCTCTT 
TCCACTTTTC 
ATCTTCTTAT 
TTTAAAAACC 
AAAAATAATA 
CATTTTAAAT 
CGCAAGATGG 
ATTCAGGCCT 
TAAAAAGCAC 
CTCCGTCGCC 
CAGGTTCAAG 
ACTATGCCCA 
CTGGTTTTGA 
TTATAGGCTT 
CGTGATCTCC 
CACTCGCGAT 
CTGATCTTCT 
AGGTTAAAA6 
ATCGGGTCTC 
ATTTCGCTTT 
CCCACCCAAT 
TGCTTTTCTG 



AAGGCAGTCA 
ATAGAGCTAG 
TGAGACAGGT 
CTCACTTTGC 
ACCTGAAACA 
AACAGGTTGT 
AACACTAACG 
TTTTTTTAAA 
TTGGTGGCTT 
TCTTCACTCT 
CTGGTGAGAT 
CAGAATCTTT 
CCAAACACTT 
CTCTTGGAGA 
AAATCAGAAT 
TGCATTGTCT 
GGTGGCAAAA 
GTGGCTCTGA 
ACACGGTCTC 
TGGAAAGGCC 
GAACATACAC 
GGAGAGCTGA 
GGGCTTCACT 
TACTCCATTT 
AAGGGGACAA 
TGTGACTCAG 
TTGTGGAGAA 
TGTTGGCAAA 
TTACCGTGCT 
GGTTCAAGTT 
GTGAGTGAAT 
GCCATGGAGG 
TGTGTGACTG 
AAGAAGATAC 
AAGAGATTTT 
CTCTTTTTGT 
AATATGGAGT 
ATGATGGAAT 
AATGTAATGG 
GAAAGTCACT 
TAGCCTTCAT 
CGTTTGTTAT 
CTATTTGTTC 
CAGCCTGGAG 
TGATTCTCCC 
GTTAAGTTTT 
ACTCCTGACC 
CAGCCACCGT 
CAGTGGAAGT 
TCCATGTTTA 
TTCAATATCT 
GTTTACTAGG 
AGGTCAGAAC 
CTGGAATCTT 
CTCCACAAAT 
AAAGCCCAGG 



TGGTGCCTGG 
GGAAAAGATC 
GCTCAAGAGG 
TTTTACATTC 
GCCTCTGAGC 
CTCTCTGTTC 
TGTGAGTTGG 
GGGCTGGGTA 
ATTGCTTAAC 
GATGAGGGCT 
TGCAGCCTGC 
AGGTAAATGA 
GAGTGGAAAG 
CGGCTGGCAG 
TTTATTTTGG 
GGACTTCTAA 
CCATTCCCCT 
TATAGCTCAG 
GCTTTCTAGA 
AATTTGTTTT 
AATCACACTG 
ATATCTACAA 
GTATAACAAG 
TGTCCAGAAA 
ACTAACCCTT 
CAGCAACAAA 
GTGTTAGCCC 
TATCAGGCTG 
CCCAGGATGG 
CACAGGTGTC 
GGATTCCAAG 
TAAAGATGTG 
TGTGTTAATT 
GTTATTAGGA 
AAGGATGACT 
GACCTAAAAT 
CATGTAGACA 
GTGAATTGGG 
AGACAATACT 
TTGAGGGAAC 
CAAGACCTCT 
GAAAAGGCTC 
AATTCAAACA 
TGCAGTGGCA 
AACTCAGCCC 
GTATTTTTAG 
TCAGGTGATC 
GCCCAGCCAT 
ATCTTTGGCC 
TGGGTACCCT 
TTCTGACTCC 
TTGCATTCAA 
GTGAGTGCTT 
GCGATATTTT 
TTGGGGATTT 
GCAAGACCCC 



AGAGGGAACT 
TTGATGCCAA 
GCCCCTCCAT 
CCTCTTCCCG 
TGAAAAGAAC 
TTGAGACACA 
CCAAATGCAA 
GCCAATCAGA 
ATCCTACAAA 
CAGACTTGAT 
TGCCTCCCCT 
GGTAAGTCCT 
AGAAGCCTGC 
GAAGTGAAGC 
AGGGCAGGTT 
TCTCTGTGAA 
GGGTTGAATT 
AGAAGAAGTG 
GATGTCTTCT 
TAATATGTTC 
GAAAAAGTGG 
AAACAAAAAT 
GGGACAAACT 
ATACAACAAT 
TGTTTACTCC 
TGCCTTCTGA 
CAATGTGGAC 
AGGTGAGAAA 
TGGGAGTGTG 
TCTGCTGGTT 
AACTTACTGA 
TTGATAGTCT 
CCACTGGGGT 
AGAATTTCAT 
TTAATAGCCG 
TCTGGGATGA 
ACACCATTTT 
AGTCATTTGG 
TCACCGTGTT 
AGCTGTGCTG 
CAAGGTAGTG 
AAATTCAATT 
ATCCTTTTTG 
TGATCTTGGC 
CCCGAGTAGC 
TAGAGACGGG 
CACCTGCCTC 
ATTGTTTTCA 
TTTGTGGACG 
AAATGCTCCC 
TTGAAGGTAT 
TTAGCGAATT 
TTGGCCAAAG 
ATTTCCTCTA 
GAGCACTGGG 
TGCTTCATGT 



TGCTGGGAGC 
AGCAGGGTGG 
ACCATCATCT 
ATGQTGTCTT 
AGTCACCACC 
GGCATTACCT 
ATGAGCCAAT 
AGAGGGGGAA 
ATGATTTAAA 
AACGCCCGTG 
CCATCAGCCA 
GATTTTTAAA 
TTTAAACTGG 
TCACCTTACC 
GGCTACTTTG 
TTTAAAAGCC 
CTTCTGGAAC 
GTTGGCTAAG 
GCTG6TAATT 
CAGGACTGGA 
CCAGGCAGCC 
TCAGAAAAAA 
AACCCTTTGT 
CAGTTTTGGC 
ATTTTGGGAG 
GACAGGGATT 
TGATCTGGGA 
GCGACATTTT 
TGTTTTTAAG 
ATCTGCACTC 
TGGAAGTCTA 
TTCAATGTGT 
CAGGGGAAAA 
GGCTAGGAGA 
CATTTGAAGT 
TGAAATAACT 
CACACAAATG 
AGGTCTGTAG 
TCCAAAATAT 
TAAGTTCTCT 
TGGGTAGGGT 
GTATTTGTTA 
GTTTTTTTTT 
TGACTGCAAC 
TGGGATTACA 
GTTTTGCCAT 
AGCCTCCCAA 
TTTTTAATCT 
TCAGGAAAGC 
ATTAATTGAC 
GAGACAAAAT 
GGAAACTGGA 
TTCACTTCTG 
TATCTTTCCC 
TTGTGATCGT 
CACAGTATCA 



60600 

60660 

60720 

60780 

60840 

60900 

60960 

61020 

61080 

61140 

61200 

61260 

61320 

61380 

61440 

61500 

61560 

61620 

61680 

61740 

61800 

61860 

61920 

61980 

62040 

62100 

62160 

62220 

62280 

62340 

62400 

62460 

62520 

62580 

62640 

62700 

62760 

62820 

62880 

62940 

63000 

63060 

63120 

63180 

63240 

63300 

63360 

63420 

63480 

63540 

63600 

63660 

63720 

63780 

63840 

63900 
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AACACAGACA 
CATATGGAGT 
GGGGTGGGGT 
GTTGTGGTTT 
CTGCCGTTTC 
CATTTCTGTT 
CTAGAGACCA 
TTCAGCTTGG 
GCAGCCACAC 
AGATAATTAA 
AATAGATTTG 
TGAACGGGGT 
GAGAATGGCT 
TTGATGTCCG 
TTGCTAAAAT 
TGAAGAATTT 
TGAGGTGGTG 
CCCAGAGCAC 
AACCTGACCA 
GTGGTGGGAG 
TGGGAGAAGC 
TTAGGGCCTT 
TGGAGCCTTT 
TTGCCTCTITG 
ATGGCAGGGA 
CAAAGCAGCC 
GTACTGATGA 
GACTCACTTC 
GAGACCCTGA 
AGGGCCAAGC 
AAAGAGACAC 
CTGTCTTTGG 
CTCTCCTCCT 
TTTTGTTTGT 
TAAAATATGG 
ATAAAAAGAA 
ATTAACCCTG 
GTCCTTTTTC 
CCCACTTTCT 
ACTGTGGCTT 
CCCCAAAAGT 
AGCAATGGAG 
AACCTTAAAA 
GAGTCTGAAA 
TCATCTAGTC 
ATTTGACTCT 
TAGCTCACCT 
GATGGAGTAA 
GCTTGGAAGC 
TTTAAGCTGG 
GCATCTAAGT 
TAAAGATTAA 
TAGTAATTTG 
CTGTTGGGTC 
GTGTGTGTGT 
AATGCAATAG 



TAGAAGCTTG 
CATCTCTATG 
GGTGAGCAAG 
TAAGGAACTC 
AGCCCATTTC 
GTGGGTGGGG 
CATTCTCACC 
GTAAGCCGGG 
CTTGCCCAGG 
CTTTATTTGA 
TTCTAAAGAG 
TTCAAAGGGC 
CCTCACCACA 
GAAACCAGGA 
GTTTATTATT 
TGAACTCCAG 
TGCTGGACGG 
ACATTCCCCT 
TCTTTAAGAT 
TACCTGGGGG 
CTTCTAAGGA 
CAGAATTTTG 
ATTAAGGATG 
TGTCTCAAGC 
ACAGCCTATG 
CGCCAGAAAA 
GCTTGCTTGA 
TCCTAGAGGA 
TGTGCCCAGA 
AGGGTGGCCC 
CCTGTAAAAT 
TTTGTTTTAC 
CCAATCTCAC 
TTTGTTTTTT 
CCAGTATAGG 
ACCAAAATGC 
TCATTTTACC 
TTTGATGTTG 
ATCACAGAAG 
CAAAACATTT 
CTGTTGAGCT 
ATTGGAGAAT 
AAAATCTCCT 
GATGAGAGCA 
ATGAGGCTGA 
AAATTCTIGCC 
TTTAGTCATC 
AATTTGACTG 
TGCTGAGATG 
ACTCTACCAC 
TTTCTCATCT 
ATTAACAAAG 
TGGAATGAAC 
AGAGTGAATG 
GTGTGTGTGT 
GGTTTTTATT 



TACAAATTAT 
CCCTTTCATA 
AATCCCGTGG 
AATCTAAAGC 
TCTTGTTAAG 
CCCTGGTTGG 
CTGCCTTTGT 
TCTGCGGCGG 
TTCTCTTAAA 
CATTTTTCAT 
CAAACAATCT 
CAAGCTACTA 
GCTATTCTTA 
TGTGCTGAAG 
CTTCAGAGGG 
CTTTGGAGTG 
CAGTCTAGGG 
TTGCCAGGCT 
CCATTTTAAC 
TCAGCAGGAT 
TGAGGCAGGA 
GCTAAAGGCT 
TTAGAATCCA 
TGGTGGGAAT 
CACAGGGGCA 
TGGACATTTA 
CTGATTATCC 
AGACCCTGTG 
AAGCGTCAAC 
ATTTAAAGAG 
ACTCCTATGA 
GGGGCCGGGC 
CACACACCCT 
TAGCATCTTA 
TGCATACAAA 
CTATTATAGT 
ATAACAACTT 
TTGCAATTAT 
CACAATAAAT 
CAGTTGCCTG 
TTTCCTGGAA 
ATGGTCATCT 
CAGTAGGGCT 
ATTTTCAATC 
CTAATGATAA 
TCTGTCTTGA 
TTAATTCCAC 
CATGTGTATC 
TGGTGTAAAG 
TGGGTAGTTG 
GGAAAATGGA 
AATGTGAAAG 
ATCAAGGGAA 
GATATTAAAT 
GGCGATCAAC 
GGGAAGGTGG 



TGAGAAGTTA 

CAGATGTGAT 

CAGAATCTGC 

TTGAACCTGA 

ATCGCTCTCT 

GAGGCATTGG 

TACTGGGAAA 

GGATTGCCAT 

TCTCTTGCTC 

ATCTAATTTT 

TGCTGTTATT 

AGCTGTGCAG 

GGGTGGGACA 

TAGAAATTTC 

GGGTGGAAAT 

ATGGGAGCAC 

ACCTGGTCTA 

TTAGTTTCCT 

CCTC7VGATTC 

AAGCACGAAT 

AAGATTAGCA 

ATACTAGTGG 

CTCCCATGAC 

GTAGAAGTGG 

GGTGCTCTAC 

GGCACTCGTG 

ATGACTTACT 

GCTGCCCCAG 

CTTGTATCTG 

GCTCCTAGGG 

AAACTTATTT 

CCCTTGTTCT 

GGCCTCTGGG 

GTTGTTACTA 

ATGTGCTTTC 

AGTTGGATTT 

ATTTTTATCT 

GAACATCAAA 

AATCTTGGGG 

TCCAGCCCTT 

TAA6AAGAGG 

TGTGGAGGTT 

GATGATATAT 

TTGTCATGAT 

GACTTGCTTT 

TCATGCCCAC 

TAGGCAGAAG 

TGAAGQGGTA 

AACACTGGAC 

AATGACTTTG 

GTTAATAACA 

CACCTGAACA 

GTCTTCAATT 

TCTGGGATTT 

ATTGGTTCTT 

GTAGCAGGGA 



TTGTCTTTTC 

TTACGAAGAC 

TAACACACTT 

TTTTCAGGGA 

GGTAGAGTTG 

CTCCACTGCA 

CCGAACGCGG 

CTGAAGACAG 

TATAACTGAA 

TAAGAATATG 

AAAAACGTGT 

GAAACAAACA 

TAGTTTCAAG 

CAGGGATCCC 

ATTTCTTTAA 

AGTGCAGGGA 

GCACTGGCAG 

CCTCTAGGCA 

TGTGGCTGTG 

CTGTGAGAGC 

AGAGCCCTTA 

AGGTACTAAG 

AACATCCCAG 

ATGAAACAGA 

TGGTGTCTTC 

GTGTCTACTG 

GAGTAGATCG 

CCACTGAGCA 

GAGAAACCAG 

TTTT7U\TTGA 

CACAAGCACC 

GGTCAATTGG 

AGGCTTCCTC 

GGGGTACTTG 

TGATTAAAAC 

TTAGACTAAC 

TTGTATGACC 

TTTCATAGCT 

GCTGGGCTCT 

TCTTAGCCTG 

GTCTTCTACT 

ATTCCAGGCT 

TCTGGACAAT 

TTCATCTAGT 

GTCTTTGCAG 

TTAGAAAATT 

GCTGTGGGTC 

GGAGGCTAAG 

TTAGAGTCCA 

AGTGAGTTAT 

TCTACTGCAT 

AAAGCTTGTG 

TGGGTGTTTT 

TGGTTTGTGT 

CACTGTGACC 

GATGCATGAA 



TCCCTTCCTC 

CTCTGGGTTA 

GAGAAGCAAT 

TACCATTTTG 

ACGTGACACT 

GCCTGGGTGT 

CGCTGTGGCT 

AGGCAGGAGG 

AGGAGGGCAT 

ATTTTAAAAT 

TTACTTAAAT 

GTGCAGTGAG 

CCAAATGACA 

TCAGAGTTAT 

GAGTCTTCCT 

AGGCGGGATG 

AGCTGTGTGT 

AAAGGGTTTG 

GTGATTGGGG 

TGAGAACAGG 

AATGGATTCT 

ACCTGACACC 

CTTTGCCAAT 

CTGTTTTGTG 

TATAAAACGC 

AGTTTGTATG 

AACGTATGTG 

GCCTAACCTG 

AACTTGCAAC 

CCTTGTTTTA 

TAACCGCATT 

TCTGCATTAT 

CCTTCTTTTT 

CCTACTTATT 

AAAGCCAAAA 

AGACCACCTC 

TTGTCTCAAT 

GCTTTTCCAC 

TGTTGGCCCA 

ATACAACATC 

TTTTGAATAG 

TCTTCTTAGG 

AAGGTGAGCA 

CAGCCTCATT 

TGTACTCTAG 

AGAGTGCAGC 

AAGGAATGTT 

AGATTTTATG 

GACACCTGAG 

ATAAGCTCTA 

TGGGCTGTTG 

AGTAAATAAT 

CAGTGAGTTT 

GTGTGTGTGT 

TTAGGAAAGA 

CCATATTAAG 



63960 

64020 

64080 

64140 

64200 

64260 

64320 

64380 

64440 

64500 

64560 

64620 

64680 

64740 

64800 

64860 

64920 

64980 

65040 

65100 

65160 

65220 

65280 

65340 

65400 

65460 

65520 

65580 

65640 

65700 

65760 

65820 

65880 

65940 

66000 

66060 

66120 

66180 

66240 

66300 

66360 

66420 

66480 

66540 

66600 

66660 

66720 

66780 

66840 

66900 

66960 

67020 

67080 

67140 

67200 

67260 
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GGGGGACCTC CAAATTGAAC CTTGTTTTGA GTCAACTGCA AACCACAACC AAGGAGGTCC 67320 

TGGGAGACCT GGGGTGACTT GGGGTGATTG GGTATGCAGC ACATTCCTGT TCTTGTGTCC 67380 

TGATGCCTGG CAAGTAGGGA CCTGCAGAAA ATACTGATTC TCCTCCAGGC AGTTCACATG 67440 

ACTAGCTTTT AGGAGTGAGT ATACCGTTGC CCACCCCTAA AATTCTTGAT CATGTCTCCA 67500 

GATGTCTACT GACCACTGAT GCTGAGGTCA TGAATCTTGG GCATTCTAGA GGCTTTGGGA 67560 

AAAAAAATTC TACTTACTTC TTTTGCCCAG ACACTCTGGG GTCTACCTCT TGGTAAATTA 67620 

TTCAAATGAG GTTTCTGGTC ATGCAAATGT GGTTTCTAGA GCCTATTTGA ATTGAACAAG 67680 

TAGTTCTTAT TATTAGTAAA ACAGCAAGGA TCCCTAACTT GGGGTCCAAG GGTAAATTCA 67740 

GGGTTTCTGT GAACTTGGAT GTAAAAAAAA ATTGTGTTTA TTTTCAATAA TCTCTAACTA 67800 

GAATTTAACA TTTTCTTTCA ATATGAATGT AGGCAAAACT CCATGGTAGT ATTAQCTGCA 67860 

ATTGTGACTA TCACCAGGAT T^TCACATT TTCATGTCTT ATTACACCTA TTACATATAT 67920 

CACAAAAAGT GGGTATTTGA TATCAAGTTA GATCTGCACT AGGTAGATAT TCTTATTTAA 67980 

TGTATTAACA AGGAAGCACA TATATTGTTA TCAGGTTGGT GCAAAAGTAA TTGTGGTTCT 68040 

TGCCATTAT^ AATAATTACA AT^CAGCCA GTCTGGCCAA CATGGCGAAA CCCCATCTCT 68100 

ACTAAAAATA CAGGTGTGGT AGCACACACC TGTAATCCCA GCTACTTGGG AGGCTGAGGC 68160 

AGGAGAATCA TTTGAACCTG GGAAGCAGAG GCTGCAGTGA GCCAAGATCA CACCACTGCA 68220 

CTCTAGCCTG AGCAACAGAG TGAGACTCTG TCTCAAAAAA ATTAAAAAAT AAAAAAAAAC 68280 

TCTGTAATTA CTTTTGCACC AACATAATAT GATATCACAC ATTTATTTTA AAAAGTATTT 68340 

TGACATTGTC TTTTAATATA AATTTTTTTA AATCTTATAA TATTTTAATT TGTCATGTAA 68400 

AAATATTATT TTGAGAAGAG GCCTGTAGGC CTCACTAGAT TACAAAACAG ATCCATCGTA 68460 

CAGATGAAAG GTTAAGAACA CCTCATTTAC AGCATTCTCT CACACACGAC T7VACGAAATG 68520 

ACTTCTGAAC AGCGCCAGTT GATAGATGTT CTCTGCCAAA AGGGGAATAT GATCTTCCCA 68580 

TATGTTCCTG CCTATGGGTA 6CCTTGGAGT TGTGAAGGGA CTTTGGCATA ATGAAGATGA 68640 

TAATAAGAAT GATAATGGTA ATTTGTTGAG TGCCTGCTGT AAGCCAGGTG GTTACAGTCC 68700 

TGTTCAATGT CATGTTTAGT TTAATCCTCC CAATGACCTC AGGAGGTAGT GCATGGAACA 68760 

AAGACAGAAG AGATCCCCTG CCCACCCACG GTAATGAAAC ATGGGTACAG GTGAAGGCAA 68820 

AAGTGGGGAC TGACCCTTTG GAGATGGCTG ATGTCACGAG TGTGCAACCT GTGCAGTTCC 68880 

ACGGGGCCCC ATGCTTAGAA AGGTTCCATG TTTGGTTTAA GGCTCTGCTG TTGCCATCTT 68940 

AAAATTCTTC GTAAGTTTTG AACAAAGGGC CCTGCATGTT CCTTTTACAC TGAGCTCTGC 69000 

AAATGATGTA GCTGGTCCTG CCTCTGGTTA TGGTGAAATG GAATGTATGA CAACTCCTGA 69060 

GACTGGGAGT CTGGGAAGCT GCTGCGGAGA GCCCTCTCCT CATTTTCATC AGGCTCAGCT 69120 

ACGCAACCTC TGGTGGAAAG CTATGGCCTG TTGAGGAGGG AGGATGTCGT TTTTGAGTTA 69180 

GTGAGTTTTC CAGTTTTGTT TGAGCTCCAA AGCTTTCCTC CAAACAACTG GAAAGATGGC 69240 

TGAATAATTG GCTGAAAGGG ATTTAATCCC TTGAAAAACC TTTCTGGTAG GGAGTTGCTG 69300 

GCAATACTGG TGGGTTTTTC ATGATTTTAT TTTACAGAGG GCTTGCTACG TAAACCAGTG 69360 

AGCCAGGAGA AACAGAATAA AGTCTGTTCT GGAAGGAAAA ATGAGACCTG GTGTGCCACG 69420 

AGTCTAGTGT TCTCATAGGA AGGCTCTAAA AACAAACTCA GCTTTCCTGC TATTGAATGA 69480 

TTATCTCTAT AAAAGGAAAC TTTACTTCTT CTAAAGGAGA GGTCGTCTAA TTTGTGAGAA 69540 

AATTCAGATG TTATTTGCTT CTTAAGCTGC AAGGATGCTA ATGAAATAAT TCTCATGAAG 69600 

TTCTGTTGGT GTTTTAGGGC TAAGTTTTTA TAGACTGTTC CAAAATTCAA AACAGGGATG 69660 

TGGACGTAGT GATGGTGGAA GAGGGGAAGA CTTTTCCTCG ATTTCTTTGC CTGAGGGATG 69720 

GAATTCAGGC TCCCCCAATA ACATATTCAT GGTCTTTCTC TGGTCAGTCA GTGATGTTCA 69780 

TAACACAAGC AAGCCTGTCA TCAGGACCAA TCTGTGATGG CTGAGACATC AGGTGCTCTT 69840 

CCAAAAGAGC CATAATTCAC CCTTCATTTC CCAAGGTTTT TTTTTTCTTG CTGTTATTAC 69900 

TGCTCTTTTA TCATGGTTAA TAAGTCTGAG GTGGCTTCAG ACAGCCAGTC CTAACCCCTG 69960 

AGTCAATCTG GGGCCTCTAA CAGGAAGCCA GACTGAAGTT CTGATAGATG GGTTTGAGTG 70020 

GCTGTGAACT GTGTTTCTGT AGCATCCAGA CTGATTTGCA CTGAAAGGGA GCTTCCATAT 70080 

TAGGGTACAA GGATGATCAA TATGTCTCCT GTTTATATTT GGTGGAAAAA GTTGTGGGAA 70140 

TCGTGCTTAA AGGATCTCAA CTTTGAAATT AAAAGTATAA CGTCCTAACA GACATCCTCC 70200 

TTCTCTTTAG AAACACAAGG ATCCATTTTC AAGTAATTTC AAAAGAACTA TGTTGCTTTC 70260 

CCCACCCCTT CCCAAGTACA CTTATTATAA TATATCCAGT CCATTTGCTA GCTTTGTGTC 70320 

TTTAGAAAAG TTGCTTAACC TCTCTCTGTA AAATGGTGCT TATATTAGTA CTAACATTCA 70380 

GGGTTATTGT GAGGATTAAA TGAGGTAATT CATGTAATGA CTAGTTCTAT TTCTAGCACA 70440 

ATTTAAACCC TCAACAAATA TGAACTATTA TCACTGTCAT AGTTTTTGTT GTTGTTTTCT 70500 

AATTATATAA TCTTCAAGAT TCTGAGATGG GGGCTGTTGC TCTTTCCTTG ACTTGAACAT 70560 

CTTGGTCTTT TCCTAGGAGG AT^CTTGACT CTTGAAATGG TCAAATCCAT TGTCCTAGTT 70620 
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CATCCTGACC GCTCCCTGGC TCCAATCCCC ACCCCTTACC GTCCTCCACC GTTCCTACAT 70680 

TCCTGCACAG TTGGTCTTAT TTATTTTTCA GTCAACTAAG GGTGTTGTTA AATCTTTTAT 70740 

TTTTCTGCTG CCCGATTTGG TTCTAAGCAC TCCACTCCCT ACGCTGCTCA TAACAAGAAT 70800 

GCCTGGGAAC GCTCAGTCAG CCATATCCCT CCCCTGTCGG AACACCCAGT TCTTAATGCT 70860 

CCTGGAGAGG CAACATTTCT GAGGCCCCAC TGCCATAAGC CCCCCTCCCC CTITGAAGCCA 70920 

GTGGTCTGGT AGTAATGAAC CCCCAACGGC CCGGAGAAAA CTGGGGCAAG GTGTTTGTCT 70980 

GGGGAAATGT TGCATGTTGC CTTGACTGTG CTTTCTTCTA CAAAGCTTAA AAAGAGATAT 71040 

TATATTATTT TATTTTTATT TTTATTTTTG AGATGGAGTC TTACTTGGTT GCCCAGGCTG 71100 

GGTGTGCAGT GGCACAGTCA TGGCTCACTG CAACCTCCAC CTCCTGGGTT CAAGTGATTC 71160 

TCCTGCCTCA GCCTCCCAAG TAGCTGGGAC TACAGGCACA TGCCACCATG CCTGGCTAAT 71220 

TTGTATATTT TTAGTAGAGA CGGGGTTTCA CCATATTAAC TACATTGGTC TTGAACTCCT 71280 

GACCTCAAGT GATATGCCCG CCTCGGACTC CCAAAGTGCT GGGATTAAAA GCATGAGCCA 7134 0 

CTGCGCCCGG CCAAAAAGAG ATATTCAAAA GCTCCCTCTG ACTGTGTGTG CTGAAGGCTG 71400 

AGTGCTGATG CCATTGCTTA ATTAATGTTG TTCATGATCT CCATTTGGGC GATTTGTTTA 71460 

GCTCCTTGTG GCCCTTTTTG GACTTAGCTT ATCATGTGAC ATTGACAAAT TAATGAGAAG 71520 

TGAGCATGTG ATGATGCTTG GATTAGGACA GAAATCACAT CTAGGACATC TCAGGCCCTT 71580 

TCCACCTGGG ACCTGAGACC TCAAATCTCT TGGCAGGAGA TGAGTGGGTC TACACAGCCC 7164 0 

GATTTTGAGG TAGGTGTGGC TAGCCTCATT TATGCGATGG GAAAACTGTG GTCCGGGAAC 71700 

CAGGGGTTTT CAAATTATGC TTTTTGCCCA GGGCTGGATG TAGGATGTCT GGGGGAGAGG 71760 

CTTGACTGAG ATCTGGGTAC ACTGAGCCTC CACTTTAGGA GGTAACCTAG AGACTACACC 71820 

TACTCCCTAA ACTGTATTGA CTTTTGGAAG TCAACCATTT AGAAGAGTGT GGTTTTGGTT 71880 

TCGATCGTAT CCCAGCAGTC TTTTCTCTGC CCTTGTTAAT CTGATTCATG ATCTGAACCT 71940 

GGGCTGGCTG GAQGCTGGCC ATGTCACTtT GCAGACCATG GACACCCCTG AGTGCCCTCA 72000 

CAGAACCAGC CAATGGAAAA GTACAACGTC TTCTGGCTTC TCAGCCTTGC CATCTCCCTC 72060 

TGGCCTATTT GATACCCCCT TTTATATTGA GGGAGTGAAA ATGTAGCATC CAAACTGAAA 72120 

ACGCAGGTTT TTCTTTGGTT TTTATAGGAA AAACAAATTG GCATGAACAC TCAGTCAAAC 72180 

CAGCTCAGGC TGTTTGGGCA GATGCCTTTC TTTGCTTTTT TCTGTTTATT TTCCTACAAA 72240 

TCAATGCTTA ACTGCGTTGT TATCGGAGCA GAGCAACAGG TGCAAAAAAA TAACTCTGCT 72300 

GCCAACTCAA ATGAAAAGGT AGGGCTTATA CCCTCTGGGA GGTATTCAGA AGATAACAGA 72360 

AGCCCCTGCC AGCAACTGAA TTAACAGCTC TGTTTACGGT GGGTTTTATG TTAACAACCT 72420 

GCTCCTGACC CTCCTACACA TAAACACACC ATTGTCTCAG AGAGAGACAT TCAGCCATCC 72480 

AGACa\ACCCA CTGCTTTATT CTGCCCTGAG TGGAGATTGG TTTTGGCTCA GGCTGCTTTG 72540 

TGAAACTCAG AAGCATTATC CTCTCTGCCA ACTCCACGTC CTAGTCAGAG TTTTCTGTGA 72600 

AGGCAAGGGC ATGGGGTTGC CGGAGAGAAG AGGATTGGTC CTGCTTTTAA GCCTAGCTGA 72660 

AATTCTTTTC AAGGTTGGTC ATTCTCAAAT GCCAGAGAGG GTTGCCCGGC TCTCTCTGCT 72720 

CTTGCCCCAT TCCATTCACA ACAGGAGGTG GGGAATGAGC TCAGATGACT TTGGAAGGAG 72780 

CCACTATTAT TTTGGAAGCC GTGTCCTTGT GAATAGTCCA TCAGGGTAGG GCAGCGTCTA 72 840 

TGTTTTGTTA ACTATTGTAT CGCCAGCACC TAGCAAAGTG CCCAGCATCT AGTAGACACT 72900 

TGGTAAATAT GTATGAATTA CAGAGGGT 72928 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5427 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

ATTTATCTTC ACTCTGATGA GGGCTCAGAC TTGATAACGC CCGTGGTGCC CCATCCCTAT 60 

AGGAGCTGGT GAGATTGCAG CCTGCTGCCT CCCCTCCATC AGCCACAGCT ATTGGATTTC 120 

CCACCCAGAA TCTTTAGGTA AATGAGATCA TGATTCTGGA AGGAGGTGGT GTAATGAATC 180 

TCAACCCCGG CAACAACCTC CTTCACCAGC CGCCAGCCTG GACAGACAGC TACTCCACGT 240 
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GCAATGTTTC CAGTGGGTTT TTTGGAGGCC AGTGGCATGA AATTCATCCT CAGTACTGGA 300 

CCAAGTACCA GGTGTGGGAG TGGCTCCAGC ACCTCCTGGA CACCAACCAG CTGGATGCCA 360 

ATTGTATCCC TTTCCAAGAG TTCGACATCA ACGGCGAGCA CCTCTGCAGC ATGAGTTTGC 420 

AGGAGTTCAC CCGGGCGGCA GGGACGGCGG GGCAGCTCCT CTACAGCAAC TTGCAGCATC 480 

TGAAGTGGAA CGGCCAGTGC AGTAGTGACC TGTTCCAGTC CACACACAAT GTCATTGTCA 540 

AGACTGAACA AACTGAGCCT TCCATCATGA ACACCTGGAA AGACGAGAAC TATTTATATG 600 

ACACCAACTA TGGTAGCACA GTAGATTTGT TGGACAGCAA AACTTTCTGC CGGGCTCAGA 660 

TCTCCATGAC AACCACCAGT CACCTTCCTG TTGCAGAGTC ACCTGATATG AAAAAGGAGC 720 

AAGACCCCCC TGCCAAGTGC CACACCAAAA AGCACAACCC GAGAGGGACT CACTTATGGG 780 

AATTCATCCG CGACATCCTC TTGAACCCAG ACAAGAACCC AGGATTAATA AAATGGGAAG 840 

ACCGATCTGA GGGCGTCTTC AGGTTCTTGA AATCAGAGGC AGTGGCTCAG CTATGGGGTA 900 

AAAAGAAGAA CAACAGCAGC ATGACCTATG AAAAGCTCAG CCGAGCTATG AGATATTACT 960 

ACAAT^GAGA AATACTGGAG CGTGTGGATG GACGAAGACT GGTATATAAA TTTGGGAAGA 1020 

ATGCCCGAGG ATGGAGAGAA AATGAAAACT GAAGCTGCCA ATACTTTGGA CACAAACCAA 1080 

AACACACACC AAATAATCAG ATUVCAAAGAA CTCCTGGACG TAAATATTTC AAAGACTACT 1140 

TTTCTCTGAT ATTTATGTAC CATGAGGGGA AAAAGAAACT ACTTCTAACG GGAAGAAGAA 1200 

ACACTACAGT CGATTAAAAA AATTATTTTG TTACTTCGAA GTATGTCCTA TATGGGGAAA 1260 

AAACGTACAC AGTTTTCTGT GAAATATGAT GCTGTATGTG GTTGTGATTT TTTTTCACCT 1320 

CTATTGTGAA TTCTTTTTCA CTGCAAGAGT AACAGGATTT GTAGCCTTGT GCTTCTTGCT 1380 

AAGAGAAAGA AAAACAAAAT CAGAGGGCAT TAAATGTTTT GTATGTGACA TGATTTAGAA 1440 

AAAGGTGATG CATCCTCCTC ACATAAGCAT CCATATGGCT TCGTCAAGGG AGGTGAACAT 1500 

TGTTGCTGAG TTAAATTCCA GGGTCTCAGA TGGTTAGGAC AAAGTGGATG GATGCCGGGA 1560 

AGTTTAACCT GAGCCTTAGG ATCCAATGAG TGGAGAATGG GGACTTCCAA AACCCAAGGT 1620 

TGGCTATAAT CTCTGCATAA CCACATGACT TGGAATGCTT AAATCAGCAA GAAGAATAAT 1680 

GGTGGGGTCT TTATACTCAT TCAGGAATGG TTTATCTGAT GCCAGGGCTG TCTTCCTTTC 1740 

TCCCCTTTGG ATGGTTGGTG AAATACTTTA ATTGCCCTGT CTGCTCACTT CTAGCTATTT 1800 

AAGAGAGAAC CCAGCTTGGT TCTTTTTTGC TCCAAGTGCT TAAAAATAAG TTGGAAAAAG 1860 

GAGACGGTGG TGTGGAAATG GCT6AAGAGT TTGCTCTTGT ATCCCTATAG TCCAAG6TTT 1920 

CTCAATCTGC ACAATTGACA TTTTTGGCCG GAGTGTTCTT TGTGGTGAGG GCTTTCCTGT 1980 

GCATTGTAAG ATGTTCAGCA GTATCCACTC ATGGTCTCTA ACCACTTGAC ACCAGAAACC 2040 

CCCCAGCTGT GATAACGCAA AATGTCTCTA GACATCACCA AATGTTCCCT GGGGGTGGCA 2100 

AATTTGCCCT TGATTGAGAA CCACCAGTTT AGCTAGTCAA TATGAGGATG GTGGTTTATT 2160 

CTCAGAAGAA AAAGATATGT AAGGTCTTTT AGCTCCTTAG AGTGAAGCAA AAGCAAGACT 2220 

TCAACCTCAA CCTATCTTTA TGTTTTAAAT ATTAGGGACA ATAAGTTGAA ATAGCTAGAG 2280 

GAGCTTCTTT TCAGAACCCC AGATGAGAGC CAATGTCAGA TAAAGTAAGC ATAGCAATGT 2340 

AGCAGGAACT ACAATAGAAG ACATTTTCAC TGGAATTACA AAGCAGAATT AAAATTATAT 2400 

TGTAGAAQGA AACACCMGA AAAGAATTTC CAGGGAAAAT CCTCTTTGCA GGTATTAATT 2460 

CTTATAATTT TTTGTCTTTT GGATTATCTG TTTACTGTCT CATCTGT^CT GATCCCAGGT 2520 

GAACGGTTTA TTGCCTAGAT TTGTACTCAG AGGAATTTTT TTTGTTTTGT TTTGTCTTTT 2580 

AAGAAAGGAA AGAAAGGATG AAAAAAATAA ACAGAAAACT CAGCTCAGGC ACAATTGTCA 2640 

CCAAGGAGTT AAAAGCTTCT TCTTCAATAG AGGAATTGTT CTGGGGGTCC TGGAGACTTA 2700 

CCATTGAGCC ATGCAATCTG GGAAGCACAG GAATAAGTAG ACACTTTGAA AATGGATTTG 2760 

AATGTTCTCA TCCCTTTTGC AGCTTTTCTT TTTGGCTCTC TCATGTCCTT GGCTTGCTCC 2820 

TCTATTCTAC CTCTCTTTCT CCAGCAATAA TATGCAAATG AAGACATGTA TCCATAAGAA 2880 

GGAGTGCTCT TCATCAACTA ATAGAGCACC TACCACAGTG TCATACCTGG TAGAGGTGAG 294 0 

CAATTCATAT TCAAAGGTTG CAAAGTGTTT GTAATATATT CATGAGGCTG GAAGTAAGAA 3000 

GAATTAAAAA TTTGTCCTAA TTACAATGAG AACCATTCTA GGTAGTGATC TTGGAGCACA 3060 

CATGAATAAC TTTCTGAAGG TGCAACCAAA TCCATTTTTA TTTCTGCCTG GCTTGGTCAC 3120 

CTCTGTAAAG GTTTAACTTA GTGTTGTCAA GTAACAGTTA CTGAAAGAGC TGAGAAT^G 3180 

AACAATGAAC AGCAACGATC TTGACTGTGC AACTCAGACA TTCCTGCAGA AAAGACATAT 3240 

GTTGCTTTAC AAGAAGGCCA AAGAACTATG GGGCCTTCCC AGCATTTGAC TGTTCATTGC 3300 

ATAGAATGAA TTAAATATCC AGTTACTTGA ATGGGTATAA CGCATGAATA TTTGTGTGTC 3360 

TGTGTGTGTG TCTGAGTTGT GTGATTTTAT TAGGGGCATC TGCCAATTCT CTCACTGTGG 3420 

TTCCTTCTCT GACTTTGCCT GTTCATCATC TAAGGAGGCT AGATCCTTCG CTGACTTCTVC 3480 

CATTCCTCAA ACCTGTAAGT TTCTCACTTC TTCCAAATTG GCTTTGGCTC TTTCTTCAAC 3540 

CTTTCCATTC AAGAGCAATC TTTGCTAAGG AGTAAGTGAA TGTGAAGAGT ACCAACTACA 3600 
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ACAATTCTAC AGATAATTAG TGGATTGTGT TGTTTGTTGA GAGTGAAGGT TTCTTGGCAT 3660 

CTGGTGCCTG ATTAAGGCTT GAGTATTAAG TTCTCAGCAT ATCTCTCTAT TGTCTTGACT 3720 

TGAGTTTGCT GCATTTTCTA TGTGCTGTTC GTGACTTGGA GAACTTAAAG TAATCGAGCT 3780 

ATGCCAACTT GGGGTGGTAA CAGAGTACTT CCCACCACAG TGTTGAAAGG GAGAGCAAAG 3840 

TCTTATGGAT AAACCCTCCT TTCTTTTGGG GACACATGGC TCTCACTTGA GAAGCTCACC 3900 

TGTGCTGAAT GTCCACATGG TCACTAAACA TGTTATCCTT AAACCCCCCG TATGCCTGA6 3960 

TTGAAAGGGC TCTCTCTTAT TAGGTTTTCA TGGGAACATG AGGCAGCAAA TCTATTGCTA 4 020 

AGACTTTACC AGGCTCAAAT CATCTGAGGC TGATAGATAT TTGACTTGGT AAGACTTAAG 4080 

TAAGGCTCTG GCTCCCAGGG GCATAAGCAA CAGTTTCTTG AATGTGCCAT CTGAGAAG6G 4140 

AGACCCAGGT TATGAGTTTT CCTTTGAACA CATTGGTCTT TTCTCAAAGT TCCTGCCTTG 4200 

CTAGACTGTT AGCTCTTTGA GGACAGGGAC TATGTCTTAT CAATCACTAT TATTTTCCTG 4260 

TTACCTAGCA TGGGACAAGT ACACAACACA TATTTGTTCA ATGAATGAAT GAATGTCTTC 4320 

TAAAAGACTC CTCTGATTGG GAGACCATAT CTATAATTGG GATGTGAATC ATTTCTTCAG 4380 

TGGAATAAGA GCACAACGGC ACAACCTTCA AGGACATATT ATCTACTATG AACATTTTAC 4440 

TGTGAGACTC TTTATTTTGC CTTCTACTTG CGCTGAAATG AAACCAAAAC AGGCCGTTGG 4500 

GTTCCACAAG TCAATATATG TTGGATGAGG ATTCTGTTGC CTTATTGGGA ACTGTGAGAC 4560 

TTATCTGGTA TGAGAAGCCA GTAATAAACC TTTGACCTGT TTTAACCAAT GAAGATTATG 4620 

AATATGTTAA TATGATGTAA ATTGCTATTT AAGTGTAAAG CAGTTCTAAG TTTTAGTATT 4680 

TGGGGGATTG GTTTTTATTA TTTTTTTCCT TTTTGAAAAA TACTGAGGGA TCTTTTGATA 4740 

AAGTTAGTAA TGCATGTTAG ATTTTAGTTT TGCAAGCATG TTGTTTTTCA AATATATCAA 4800 

GTATAGAAAA AGGTAAAACA GTTAAGAAGG AAGGCAATTA TATTATTCTT CTGTAGTTAA 4860 

GCAAACACTT GTTGAGTGCC TGCTATGTGC ACGGCATGGG CCCATATGTG TGAGGAGCTT 4920 

GTCTAATTAT GTAGGAAGCA ATAGATCTCG GTAGTTACGT ATTGGGCAGA TACTTACTGT 4980 

ATGAATGAAA GAACATCACA GTAATCACAA TATCAGAGCT GAATTATCCT CAGTGTAGCT 5040 

TCTTGGAATT CAGTTTCTGG AACTAGAGAT AGAGCATTTA TTAAAAAAAA CTCCTGTTGA 5100 

GACTGTGTCT TATGAACCTC TGAAACGTAC AAGCCTTCAC AAGTTTAACT AAATTGGGAT 5160 

TAATCTTTCT GTAGTTATCT GCATAATTCT TGTTTTTCTT TCCATCTGGC TCCTGGGTTG 5220 

ACAATTTGTG GAAACAACTC TATTGCTACT ATTTAAAAAA AATCAGAAAT CTTTCCCTTT 5280 

AAGCTATGTT AAATTCAAAC TATTCCTGCT ATTCCTGTTT TGTCAAAGAA TTATATTTTT 5340 

CAAAATATGT TTATTTGTTT GATGGGTCCC AGGAAACIACT AATAAAAACC ACAGAGACCA 5400 

GCCTGGAAAA AAAAAAAAAA AAAAAAA 5427 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5510 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ATCGCTCTCT GGTAGAGTTG ACGTGACACT CATTTCTGTT GTGGGTGGGG CCCTGGTTGG 60 

GAGGCATTGG CTCCACTGCA GCCTGGGTGT CTAGAGACCA CATTCTCACC CTGCCTTTGT 120 

TACTGGGAAA CCGAACGCGG CGCTGTGGCT TTCAGCTTGG GTAAGCCGGG TCTGCGGCGG 180 

GGATTGCCAT CTGAAGACAG AGGCAGGAGG GCAGCCACAC CTTGCCCAGA TCATGATTCT 240 

GGAAGGAGGT GGTGTAATGA ATCTCAACCC CGGCAACAAC CTCCTTCACC AGCCGCCAGC 300 

CTGGACAGAC AGCTACTCCA CGTGCAATGT TTCCAGTGGG TTTTTTGGAG GCCAGTGGCA 360 

TGAAATTCAT CCTCAGTACT GGACCAAGTA CCAGGTGTGG GAGTGGCTCC AGCACCTCCT 420 

GGACACCAAC CAGCTGGATG CCAATTGTAT CCCTTTCCAA GAGTTCGACA TCAACGQCGA 480 

GCACCTCTGC AGCATGAGTT TGCAGGAGTT CACCCGGGCG GCAGGGACGG CGGGGCAGCT 540 

CCTCTACAGC AACTTGCAGC ATCTGAAGTG GAACGGCCAG TGCAGTAGTG ACCTGTTCCA 600 

GTCCACACAC AATGTCATTG TCAAGACTGA ACAAACTGAG CCTTCCATCA TGAACACCTG 660 

GAAAGACGAG AACTATTTAT ATGACACCAA CTATGGTAGC ACAGTAGATT TGTTGGACAG 720 
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CAAAACTTTC TGCCGGGCTC AGATCTCCAT GACAACC7VCC AGTCACCTTC QTGTTGCAGA 780 

GTCACCTGAT ATGAAAAAGG AGCAAGACCC CCCTGCCAAG TGCCACACCA AAAAGCACAA 840 

CCCGAGAGGG ACTCACTTAT GGGAATTCAT CCGCGACATC CTCTTGAACC CAGACAAGAA 900 

CCCAGGATTA ATAAAATGGG AAGACCGATC TGAGGGCGTC TTCAGGTTCT TGAAATCAGA 960 

GGCAGTGGCT CAGCTATGGG GTAAAAAGAA GAACAACAGC AGCATGACCT ATGAAAAGCT 1020 

CAGCCGAGCT ATGAGATATT ACTACAAAAG AGAAATACTG GAGCGTGTGG ATGGACGAAG 1080 

ACTGGTATAT AAATTTGGGA AGAATGCCCG AGGATGGAGA GAAAATGAAA ACTGAAGCTG 1140 

CCAATACTTT GGACACAAAC CAAAACACAC ACCAAATAAT CAGAAACAAA GAACTCCTGG 1200 

ACGTAAATAT TTCAAAGACT ACTTTTCTCT GATATTTATG TACCATGAGG GGAAAAAGAA 1260 

ACTACTTCTA ACGGGAAGAA GAAACACTAC AGTCGATTAA AAAAATTATT TTGTTACTTC 1320 

GAAGTATGTC CTATATGGGG AAAAAAC6TA CACAGTTTTC TGTGAT^TAT GATGCTGTAT 1380 

GTGGTTGTGA TTTTTTTTCA CCTCTATTGT GAATTCTTTT TCACTGCAAG AGTT^CAGGA 1440 

TTTGTAGCCT TGTGCTTCTT GCTAAGAGAA AGAAAAACAA AATCAGAGGG CATTAAATGT 1500 

TTTGTATGTG ACATGATTTA GAAAAAGGTG ATGCATCCTC CTCACATAAG CATCCATATG 1560 

GCTTCGTCAA GGGAGGTGAA CATTGTTGCT GAGTTAAATT CCAGGGTCTC AGATGGTTAG 1620 

GACAAAGTGG ATGGATGCCG GGAAGTTTAA CCTGAGCCTT AGGATCCAAT GAGTGGAGAA 1680 

TGGGGACTTC CAAAACCCAA GGTTGGCTAT AATCTCTGCA TAACCACATG ACTTGGAATG 1740 

CTTAAATCAG CAAGAAGAAT AATGGTGGGG TCTTTATACT CATTCAGGAA TGGTTTATCT 1800 

GATGCCAGGG CTGTCTTCCT TTCTCCCCTT TGGATGGTTG GTGAAATACT TTAATTGCCC 1860 

TGTCTGCTCA CTTCTAGCTA TTTAAGAGAG AACCCAGCTT GGTTCTTTTT TGCTCCAAGT 1920 

GCTTAAAAAT AAGTTGGAAA AAGGAGACGG TGGTGTGGAA ATGGCTGAAG AGTTTGCTCT 1980 

TGTATCCCTA TAGTCCAAGG TTTCTCAATC TGCACAATTG ACATTTTTGG CCGGAGTGTT 2040 

CTTTGTGGTG AGGGCTTTCC TGTGCATTGT AAGATGTTCA GCAGTATCCA CTCATGGTCT 2100 

CTAACCACTT GACACCAGAA ACCCCCCAGC TGTGATAACG CAAAATGTCT CTAGACATCA 2160 

CCAAATGTTC CCTGGGGGTG GCAAATTTGC CCTTGATTGA GAACCACCAG TTTAGCTAGT 2220 

CAATATGAGG ATGGTGGTTT ATTCTCAGAA GAAAAAGATA TGTAAGGTCT TTTAGCTCCT 2280 

TAGAGTGAAG CAAAAGCAAG ACTTCAACCT CAACCTATCT TTATGTTTTA AATATTAGGG 2340 

ACAATAAGTT GAAATAGCTA GAGGAGCTTC TTTTCAGAAC CCCAGATGAG AGCCAATGTC 2400 

AGATAAAGTA AGCATAGCAA TGTAGCAGGA ACTACAATAG AAGACATTTT CACTGGAATT 2460 

ACAAAGCAGA ATTAAAATTA TATTGTAGAA GGAAACACCA AGAAAAGAAT TTCCAGGGAA 2520 

AATCCTCTTT GCAGGTATTA ATTCTTATAA TTTTTTGTCT TTTGGATTAT CTGTTTACTG 2580 

TCTCATCTGA ACTGATCCCA GGTGAACGGT TTATTGCCTA GATTTGTACT CAGAGGAATT 2640 

TTTTTTGTTT TGTTTTGTCT TTTAAGAAAG GAAAGAAAGG ATGAAAAAAA TAAACAGAAA 2700 

ACTCAGCTCA GGCACAATTG TCACCAAGGA GTTAAAAGCT TCTTCTTCAA TAGAGGAATT 2760 

GTTCTGGGGG TCCTGGAGAC TTACCATTGA GCCATGCAAT CTGGGAAGCA CAGGAATAAG 2820 

TAGACACTTT GAAAATGGAT TTGAATGTTC TCATCCCTTT TGCAGCTTTT CTTTTTGGCT 2880 

CTCTCATGTC CTTGGCTTGC TCCTCTATTC TACCTCTCTT TCTCCAGCAA TAATATGCAA 2940 

ATGAAGACAT GTATCCATAA GAAGGAGTGC TCTTCATCAA CTAATAGAGC ACCTACCACA 3000 

GTGTCATACC TGGTAQAGGT GAGCAATTCA TATTCAAAGG TTGCAAAGTG TTTGTAATAT 3060 

ATTCATQAGG CTGGAAGTAA GAAGAATTAA AAATTTGTCC TAATTACAAT GAGAACCATT 3120 

CTAGGTAGTG ATCTTGGAGC ACACATGAAT AACTTTCTGA AGGTGCAACC AAATCCATTT 3180 

TTATTTCTGC CTGGCTTGGT CACCTCTGTA AAGGTTTAAC TTAGTGTTGT CAAGTAACAG 3240 

TTACTGAAAG AGCTGAGAAA AAGAACAATG AACAGCAACG ATCTTGACTG TGCAACTCAG 3300 

ACATTCCTGC AGAAAAGACA TATGTTGCTT TACAAGAAGG CCAAAGAACT ATGGGGCCTT 3360 

CCCAGCATTT GACTGTTCAT TGCATAGAAT 6AATTAAATA TCCAGTTACT TGAATGGGTA 3420 

TAACGCATGA ATATTTGTGT GTCTGTGTGT GTGTCTGAGT TGTGTGATTT TATTAGGGGC 3480 

ATCTGCCAAT TCTCTCACTG TGGTTCCTTC TCTGACTTTG CCTGTTCATC ATCTAAGGAG 3540 

GCTAGATCCT TCGCTGACTT CACCATTCCT CAAACCTGTA AGTTTCTCAC TTCTTCCAAA 3600 

TTGGCTTTGG CTCTTTCTTC AACCTTTCCA TTCAAGAGCA ATCTTTGCTA AGGAGTAAGT 3660 

GAATGTGAAG AGTACCAACT ACAACAATTC TACAGATAAT TAGTGGATTG TGTTGTTTGT 3720 

TGAGAGTGAA GGTTTCTTGG CATCTGGTGC CTGATTAAGG CTTGAGTATT AAGTTCTCAG 3780 

CATATCTCTC TATTGTCTTG ACTTGAGTTT GCTGCATTTT CTATGTGCTG TTCGTGACTT 384 0 

GGAGAACTTA AAGTAATCGA GCTATGCCAA CTTGGGGTGG TAACAGAGTA CTTCCCACCA 3900 

CAGTGTTGAA AGGGAGAGCA AAGTCTTATG GATAAACCCT CCTTTCTTTT GGGGACACAT 3960 

GGCTCTCACT TGAGAAGCTC ACCTGTGCTG AATGTCCACA TGGTCACTAA ACATGTTATC 4 020 

CTTAAACCCC CCGTATGCCT GAGTTGAAAG GGCTCTCTCT TATTAGGTTT TCATGGGAAC 4080 
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TATTTGACTT GGTAAGACTT AAGTAAGGCT CTGGCTCCCA GGGGCATAAG CAACAGTTTC 4200 

TTGAATGTGC CATCTGAGAA GGGAGACCCA GGTTATGAGT TTTCCTTTGA ACACATTGGT 4260 

CTTTTCTCAA AGTTCCTGCC TTGCTAGACT GTTAGCTCTT TGAGGACAGG GACTATGTCT 4320 

TATCAATCAC TATTATTTTC CTGTTACCTA GCATGGGACA AGTACACAAC ACATATTTGT 4380 

TCAATGAATG AATGAATGTC TTCTAAAAGA CTCCTCTGAT TGGGAGACCA TATCTATAAT 4440 

TGGGATGTGA ATCATTTCTT CAGTGGAATA AGAGCACAAC GGCACAACCT TCAAGGACAT 4500 

ATTATCTACT ATGAACATTT TACTGTGAGA CTCTTTATTT TGCCTTCTAC TTGCGCTGAA 4560 

ATGAAACCAA AACAGGCCGT TGGGTTCCAC AAGTCAATAT ATGTTGGATG AGGATTCTGT 4620 

TGCCTTATTG GGAACTGTGA GACTTATCTG GTATGAGAAG CCAGTAATAA ACCTTTGACC 4680 

TGTTTTAACC AATGAAGATT ATGAATATGT TAATATGATG TAAATTGCTA TTTAAGTGTA 4740 

AAGCAGTTCT AAGTTTTAGT ATTTGGGGGA TTGGTTTTTA TTATTTTTTT CCTTTTTGAA 4800 

AAATACTGAG GGATCTTTTG ATAAAGTTAG TAATGCATGT TAGATTTTAG TTTTGCAAGC 4860 

ATGTTGTTTT TCAAATATAT CAAGTATAGA AAAAGGTAAA ACAGTTAAGA AGGAAGGCAA 4920 

TTATATTATT CTTCTGTAGT TAAGCAAACA CTTGTTGAGT GCCTGCTATG TGCACGGCAT 4980 

GGGCCCATAT GTGTGAGGAG CTTGTCTAAT TATGTAGGAA GCAATAGATC TCGGTAGTTA 5040 

CGTATTGGGC AGATACTTAC TGTATGTU^TG AAAGAACATC ACAGTAATCA CAATATCAGA 5100 

GCTGAATTAT CCTCAGTGTA GCTTCTTGGA ATTCAGTTTC TGGAACTAGA GATAGAGCAT 5160 

TTATTAAAAA AAACTCCTGT TGAGACTGTG TCTTATGAAC CTCTGAT^CG TACAAGCCTT 5220 

CACAAGTTTA ACT/^AATTGG GATTAATCTT TCTGTAGTTA TCTGCATAAT TCTTGTTTTT 5280 

CTTTCCATCT GGCTCCTGGG TTGACAATTT GTGGAAACAA CTCTATTGCT ACTATTTAAA 5340 

AAAAATCAGA AATCTTTCCC TTTAAGCTAT GTTAAATTCA AACTATTCCT GCTATTCCTG 5400 

TTTTGTCAAA GAATTATATT TTTCAAAATA TGTTTATTTG TTTGATGGGT CCCAGGAAAC 5460 

ACTAATAAAA ACCACAGAGA CCAGCCTGGA AAAAAAAAAA AAAAAAAAAA 5510 

(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5667 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : doiible 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

ATCGCTCTCT GGTAGAGTTG ACGTGACACT CATTTCTGTT GTGGGTGGGG CCCTGGTTGG 60 

GAGGCATTGG CTCCACTGCA GCCTGGGTGT CTAGAGACCA CATTCTCACC CTGCCTTTGT 120 

TACTGGGAAA CCGAACGCGG CGCTGTGGCT TTCAGCTTGG GTAAGCCGGG TCTGCGGCGQ 180 

GGATTGCCAT CTGAAGACAG AGGCAGGAGG GCAGCCACAC CTTGCCCAGC TGCACACCCA 240 

GTTUVCAAGTT TCCTCAGTGC GGGTATCTGC CACAGGCTGG GCTGGTCATC AAAGGGCCTC 300 

AGTCATATTT TAATAGAGCT CTTCAAGTAT CTGGCTTTGT GATAATATCA GGAATCAGTT 360 

GGTTTCTCTG ACAGACACTG CCCATTATCA TGATTCTGGA AGGAGGTGGT GTAATGAATC 420 

TCAACCCCGG CAACAACCTC CTTCACCAGC CGCCAGCCTG GACAGACAGC TACTCCACGT 480 

GCAATGTTTC CAGTGGGTTT TTTGGAGGCC AGTGGCATGA AATTCATCCT CAGTACTGGA 540 

CCAAGTACCA GGTGTGGGAG TGGCTCCAGC ACCTCCTGGA CACCAACCAG CTGGATGCCA 600 

ATTGTATCCC TTTCCAAGAG TTCGACATCA ACGGCGAGCA CCTCTGCAGC ATGAGTTTGC 660 

AGGAGTTCAC CCGGGCGGCA GGGACGGCG6 GGCAGCTCCT CTACAGCAAC TTGCAGCATC 720 

TGAAGTGGAA CGGCCAGTGC AGTAGTGACC TGTTCCAGTC CACACACAAT GTCATTGTCA 780 

AGACTGAACA AACTGAGCCT TCCATCATGA ACACCTGGAA AGACGAGAAC TATTTATATG 840 

ACACCAACTA TGGTAGCACA GTAGATTTGT TGGACAGCAA AACTTTCTGC CGGGCTCAGA 900 

TCTCCATGAC AACCACCAGT CACCTTCCTG TTGCAGAGTC ACCTGATATG AAAAAGGAGC 960 

AAGACCCCCC TGCCAAGTGC CACACCAAAA AGCACAACCC GAGAGGGACT CACTTATGGG 1020 

AATTCATCCG CGACATCCTC TTGAACCCAG ACAAGAACCC AGGATTAATA AAATGGGAAG 1080 

ACCGATCTGA GGGCGTCTTC AGGTTCTTGA AATCAGAGGC AGTGGCTCAG CTATGGGGTA 1140 
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AAAAGAAGAA CAACAGCAGC ATGACCTATG AAAAGCTCAG CCGAGCTATG AGATATTACT 1200 

ACAAAAGAGA AATACTGGAG CGTGTGGATG GACGAAGACT GGTATATAAA TTTGGGAAGA X260 

ATGCCCGAGG ATGGAGAGAA AATGAAAACT GAAGCTGCCA ATACTTTGGA CACAAACCAA 1320 

AACACACACC AAATAATCAG AAACAAAGAA CTCCTGGACG TAAATATTTC AAAGACTACT 13 80 

TTTCTCTGAT ATTTATGTAC CATGAGGGGA AAAAGAAACT ACTTCTAACG GGAAGAAGAA 1440 

ACACTACAGT CGATTAAAT^ AATTATTTTG TTACTTCGAA GTATGTCCTA TATGGGGAAA 1500 

AAACGTACAC AGTTTTCTGT GAAATATGAT GCTGTATGTG GTTGTGATTT TTTTTCACCT 1560 

CTATTGTGAA TTCTTTTTCA CTGCAAGAGT AACAGGATTT GTAGCCTTGT GCTTCTTGCT 1620 

AAGAGAAAGA AAAACAAAAT CAGAGGGCAT TAAATGTTTT GTATGTGACA TGATTTAGAA 1680 

AAAOGTGATG CATCCTCCTC ACATAAGCAT CCATATGGCT TCGTCAAGGG AGGTGAACAT 1740 

TGTTGCTGAG TTAAATTCGA GGGTCTCAGA TGGTTAGGAC A/UVGTGGATG GATGCC6GGA 1800 

AGTTTAACCT GAGCCTTAGG ATCCAATGAG TGGAGAATGG GGACTTCCAA AACCCAAGGT 1860 

TGGCTATAAT CTCTGCATAA CCACATGACT TGGAATGCTT AAATCAGCAA GAAGAATAAT 1920 

GGTGGGGTCT TTATACTCAT TCAGGAATGG TTTATCTGAT GCCAGGGCTG TCTTCCTTTC 1980 

TCCCCTTTGG ATGGTTGGTG AAATACTTTA ATTGCCCTGT CTGCTCACTT CTAGCTATTT 2040 

AAGAGAGAAC CCAGCTTGGT TCTTTTTTGC TCCAAGTGCT TAAAAATAAG TTGGAAAAAG 2100 

GAGACGGTGG TGTGGAAATG GCTGAAGAGT TTGCTCTTGT ATCCCTATAG TCCAAGGTTT 2160 

CTCAATCTGC ACAATTGACA TTTTTGGCCG GAGTGTTCTT ' TGTGGTGAGG GCTTTCCTGT 2220 

GCATTGTAAG ATGTTCAGCA GTATCCACTC ATGGTCTCTA ACCACTTGAC ACCAGAAACC 2280 

CCCCAGCTGT GATAACGCAA AATGTCTCTA GACATCACCA AATGTTCCCT GGGGGTGGCA 2340 

AATTTGCCCT TGATTGAGAA CCACCAGTTT AGCTAGTCAA TATGAGGATG GTGGTTTATT 2400 

CTCAGAAGAA AAAGATATGT AAGGTCTTTT AGCTCCTTAG AGTGAAGCAA AAGCAAGACT 2460 

TCAACCTCAA CCTATCTTTA TGTTTTAAAT ATTAGGGACA ATAAGTTGAA ATAGCTAGAG 2520 

GAGCTTCTTT TCAGAACCCC AGATGAGAGC CAATGTCAQA TAAAGTAAGC ATAGCAATGT 2580 

AGCAGGAACT ACAATAGAAG ACATTTTCAC TGGAATTACA AAGCAGAATT AAAATTATAT 2640 

TGTAGAAGGA AACACCAAGA AAAGAATTTC CAGGGAAAAT CCTCTTTGCA GGTATTAATT 2700 

CTTATAATTT TTTGTCTTTT GGATTATCTG TTTACTGTCT CATCTGAACT GATCCCAGGT 2760 

GAACGGTTTA TTGCCTAGAT TTGTACTCAG AGGAATTTTT TTTGTTTTGT TTTGTCTTTT 2820 

AAGAAAGGAA A6AAAGGATG AA/^AAAATAA ACAGAAAACT CAGCTCAGGC ACAATTGTCA 2880 

CCAAGGAGTT AAAAGCTTCT TCTTCAATAG AGGAATTGTT CTGGGGGTCC TGGAGACTTA 2940 

CCATTGAGCC ATGCAATCTG GGAAGCACAG GAATAAGTAG ACACTTTGAA AATGGATTTG 3000 

AATGTTCTCA TCCCTTTTGC AGCTTTTCTT TTTGGCTCTC TCATGTCCTT GGCTTGCTCC 3060 

TCTATTCTAC CTCTCTTTCT CCAGCAATAA TATGCAAATG AAGACATGTA TCCATAAGAA 3120 

GGAGTGCTCT TCATCAACTA ATAGAGCACC TACCACAGTG TCATACCTGG TAGAGGTGAG 3180 

CAATTCATAT TCAAAGGTTG CAAAGTGTTT GTAATATATT CATGAGGCTG GAAGTAAGAA 3240 

GAATTAAAAA TTT6TCCTAA TTACAATGAG AACCATTCTA GGTAGTGATC TTGGAGCACA 3300 

CATGAATAAC TTTCTGAAGG TGCAACCAAA TCCATTTTTA TTTCTGCCTG GCTTGGTCAC 3360 

CTCTGTAAAG GTTTAACTTA GTGTTGTCAA GTAACAGTTA CTGAAAGAGC TGAGAAAAAG 3420 

AACAATGAAC AGCAACGATC TTGACTGTGC AACTCAGACA TTCCTGCAGA AAAGACATAT 3480 

GTTGCTTTAC AAGAAGGCCA AAGAACTATG GGGCCTTCCC AGCATTTGAC TGTTCATTGC 3540 

ATAGAATGAA TTAAATATCC AGTTACTTGA ATGGGTATAA CGCATGAATA TTTGTGTGTC 3600 

TGTGTGTGTG TCTGAGTTGT GTGATTTTAT TAGGGGCATC TGCCAATTCT CTCACTGTGG 3660 

TTCCTTCTCT GACTTTGCCT GTTCATCATC TAAGGAGGCT AGATCCTTCG CTGACTTCAC 3720 

CATTCCTCAA ACCTGTAAGT TTCTCACTTC TTCCAAATTG GCTTTGGCTC TTTCTTCAAC 3780 

CTTTCCATTC AAGAGCAATC TTTGCTAAGG AGTAAGTGAA TGTGAAGAGT ACCAACTACA 3840 

ACAATTCTAC AGATAATTAG TGGATTGTGT TGTTTGTTGA GAGTGAAGGT TTCTTGGCAT 3900 

CTGGTGCCTG ATTAAGGCTT GAGTATTAAG TTCTCAGCAT ATCTCTCTAT TGTCTTGACT 3 960 

TGAGTTTGCT GCATTTTCTA TGTGCTGTTC GTGACTTGGA GAACTTAAAG TAATCGAGCT 4020 

ATGCCAACTT GGGGTGGTT^ CAGAGTACTT CCCACCACAG TGTTGAAAGG GAGAGCAAAG 4080 

TCTTATGGAT AAACCCTCCT TTCTTTTGGG GACACATGGC TCTCACTTGA GAAGCTCACC 4140 

TGTGCTGAAT GTCCACATGG TCACTAAACA TGTTATCCTT AAACCCCCCG TATGCCTGAG 4200 

TTGAAAGGGC TCTCTCTTAT TAGGTTTTCA TGGGAACATG AGGCAGCAAA TCTATTGCTA 4260 

AGACTTTACC AGGCTCAAAT CATCTGAGGC TGATAGATAT TTGACTTGGT AAGACTTAAG 4320 

TAAGGCTCTG GCTCCCAGGG GCATAAGCAA CAGTTTCTTG AATGTGCCAT CTGAGAAGGG 4380 

AGACCCAGGT TATGAGTTTT CCTTTGT^CA CATTGGTCTT TTCTCAAAGT TCCTGCCTTG 444 0 

CTAGACTGTT AGCTCTTTGA GGACAGGGAC TATGTCTTAT CAATCACTAT TATTTTCCTG 4500 
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TTACCTAGCA TGGGACAAGT ACACAACACA TATTTGTTCA ATGAATGAAT GAATGTCTTC 4560 

TAAAAGACTC CTCTGATTGG GAGACCATAT CTATT^TTGG GATGTGAATC ATTTCTTCAG 4620 

TGGAATAAGA GCACAACGGC ACAACCTTCA AGGACATATT ATCTACTATG AACATTTTAC 4680 

TGTGAGACTC TTTATTTTGC CTTCTACTTG CGCTGAAATG AAACCAAAAC AGGCCGTTGG 4740 

GTTCCACAAG TCAATATATG TTGGATGAGG ATTCTGTTGC CTTATTGGGA ACTGTGAGAC 4800 

TTATCTGGTA TGAGAAGCCA GTAATAAACC TTTGACCTGT TTTAACCAAT GAAGATTATG 4860 

AATATGTTAA TATGATGTAA ATTGCTATTT AAGTGTAAAG CAGTTCTAAG TTTTAGTATT 4920 

TGGGGGATTG GTTTTTATTA TTTTTTTCCT TTTTGAAAAA TACTGAGGGA TCTTTTGATA 4980 

AAGTTAGTAA TGCATGTTAG ATTTTAGTTT TGCAAGCATG TTGTTTTTCA AATATATCAA 5040 

GTATAGT^AAA AGGTAAAACA GTTAAGAAGG AAGGCAATTA TATTATTCTT CTGTAGTTAA 5100 

GCAAACACTT GTTGAGTGCC TGCTATGTGC ACGGCATGGG CCCATATGTG TGAGGAGCTT 5160 

GTCTAATTAT GTAGGAAGCA ATAGATCTCG GTAGTTACGT ATTGGGCAGA TACTTACTGT 5220 

ATGAATGAAA GAACATCACA GTAATCACAA TATCAGAGCT GAATTATCCT CAGTGTAGCT 5280 

TCTTGGAATT CAGTTTCTGG AACTAGAGAT AGAGCATTTA TTAAAAAAAA CTCCTGTTGA 5340 

GACTGTGTCT TATGAACCTC TGAAACGTAC AAGCCTTCAC AAGTTTAACT AAATTGGGAT 5400 

TAATCTTTCT GTAGTTATCT GCATAATTCT TGTTTTTCTT TCCATCTGGC TCCTGGGTTG 5460 

ACAATTTGTG GAAACAACTC TATTGCTACT ATTTAAAAAA AATCAGAAAT CTTTCCCTTT 5520 

AAGCTATGTT AAATTCAAAC TATTCCTGCT ATTCCTGTTT TGTCAAAGAA TTATATTTTT 5580 

CAAAATATGT TTATTTGTTT GATGGGTCCC AGGAAACACT AATAAAAACC ACAGAGACCA 5640 

GCCTGGAAAA AAAAAAAAAA AAAAAAA 5667 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQX7ENCE CHARACTERISTICS: 

(A) LENGTH: 300 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met lie Leu Glu Gly Gly Gly Val Met Asn Leu Asn Pro Gly Asn Asn 

15 10 15 

Leu Leu His Gin Pro Pro Ala Trp Thr Asp Ser Tyr Ser Thr Cys Asn 

20 25 30 

Val Ser Ser Gly Phe Phe Gly Gly Gin Trp His Glu lie His Pro Gin 

35 40 45 

Tyr Trp Thr Lys Tyr Gin Val Trp Glu Trp Leu Gin His Leu Leu Asp 

50 55 60 

Thr Asn Gin Leu Asp Ala Asn Cys lie Pro Phe Gin Glu Phe Asp lie 
65 70 75 80 

Asn Gly Glu His Leu Cys Ser Met Ser Leu Gin Glu Phe Thr Arg Ala 

85 90 95 

Ala Gly Thr Ala Gly Gin Leu Leu Tyr Ser Asn Leu Gin His Leu Lys 

100 105 110 

Trp Asn Gly Gin Cys Ser Ser Asp Leu Phe Gin Ser Thr His Asn Val 

115 120 125 

lie Val Lys Thr Glu Gin Thr Glu Pro Ser lie Met Asn Thr Trp Lys 

130 135 140 

Asp Glu Asn Tyr Leu Tyr Asp Thr Asn Tyr Gly Ser Thr Val Asp Leu 
145 150 155 160 

Leu Asp Ser Lys Thr Phe Cys Arg Ala Gin lie Ser Met Thr Thr Thr 

165 170 175 

Ser His Leu Pro Val Ala Glu Ser Pro Asp Met Lys Lys Glu Gin Asp 
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180 185 
Pro Pro Ala Lys Cys His Thr Lys Lys 

195 200 
Leu Trp Glu Phe lie Arg Asp lie Leu 

210 215 
Gly Leu lie Lys Trp Glu Asp Arg Ser 
225 230 
Lys Ser Glu Ala Val Ala Gin Leu Trp 
245 

Ser Met Thr Tyr Glu Lys Leu Ser Arg 
260 265 
Arg Glu lie Leu Glu Arg Val Asp Gly 

275 280 
Gly Lys Asn Ala Arg Gly Trp Arg Glu 
.290 295 

(2) INFORMATION FOR SEQ ID 



PCT/US98/01260 

190- 

His Asn Pro Arg Gly Thr His 
205- 

Leu Asn Pro Asp Lys Asn Pro 

220 

Glu Gly Val Phe Arg Phe Leu 
235 240 
Gly Lys Lys Lys Asn Asn Ser 
250 255 
Ala Met Arg Tyr Tyr Tyr Lys 
270 

Arg Arg Leu Val Tyr Lys Phe 
285 

Asn Glu Asn 
300 

NO:6: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2428 base pairs 

(B) TYPE: nucleic acid 

(C) STRT^EDNESS: doiible 

(D) TOPOLOGY: linear 

(ii) MOLEOJLE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6: 



CAGGGGT6CC GGGTTGCTCA GGCCATGGGA GCCACACCTG TTATTGCTGC CTCTGATTTG 60 

TGTGACACTG AGAAGCCCAC AGGCCTGTCC CTCCAACTCG GTGGACCCTC TCTGTGTGCA 120 

TTTGGTGTGT GAGCCAGCTC TGAGAAGGGT TCAGAAGCCA CTGGAGGCAT CTGGGGACCT 180 

CAGCTTCCAT GCCATCTCTG CCTCACTCCC ACAGGGTAAT GTTGGACTCG GTGACACACA 240 

GCACCTTCCT GCCTAATGCA TCCTTCTGCG ATCCCCTGAT QTCGTGGACT GATCTGTTCA 300 

GCAATGAAGA GTACTACCCT GCCTTTGAGC ATCAGACAGC CTGTGACTCA TACTGGACAT 360 

CAGTCCACCC TGAATACTGG ACTAAGCGCC ATGTGTGGGA GTGGCTCCAG TTCTGCTGCG 420 

ACCAGTACAA GTTGGACACC AATTGCATCT CCTTCTGCAA CTTCAACATC AGTGGCCTGC 480 

AGCTGTGCA6 CATGACACAG GAGGAGTTCG TCGAGGCAGC T6GCCTCTOC G6CGAGTACC 540 

TGTACTTCAT CCTCCAGAAC ATCCGCACAC AAGGTTACTC CTTTTTTAAT GACGCTGAA6 600 

AAAGCAAGGC CACCATCAAA GACTATGCTG ATTCCAACTG CTTGAAAACA AGTGGCATCA 660 

AAAGTCAAGA CTGTCACAGT CATAGTAGAA CAAGCCTCCA AAGTTCTCAT CTATGGGAAT 720 

TTGTACGAGA CCTGCTTCTA TCTCCTGAAG AAAACTGTGG CATTCTGGAA TGGGAAGATA 780 

GGGAACAAGG AATTTTTCGG GT6GTTAAAT CGGAAGCCCT 6GCAAAGATG TGGGGACAAA 840 

GGAAGAAAAA TGACAGAATG ACGTATGAAA AGTTGAGCAG AGCCCTGAGA TACTACTATA 900 

AAACAGGAAT TTTGGAGCGG GTTGACCGAA GGTTAGTGTA CAAATTTGGA AAAAATGCAC 960 

ACGGGTGGCA GGAAGACAAG CTATGATCTG CTCCAGGCAT CAAGCTCATT TTATGGATTT 1020 

CTGTCTTTTA AAACAATCAG ATTGCAATAG ACATTCGAAA GGCTTCATTT TCTTCTCTTT 1080 

TTTTTTAACC TGCAAACATG CTGATAAAAT TTCTCCACAT CTCAGCTTAC ATTTGGATTC 1140 

AGAGTTGTTG TCTACGGAGG GTGAGAGCAG AAACTCTTAA GAAATCCTTT CTTCTCCCTA 1200 

AGGGGATGAG GGGATGATCT TTTGTGGTGT CTTGATCAAA CTTTATTTTC CTAGAGTTGT 1260 

GGAATGACAA CAGCCCATGC CATTGATGCT GATCAGAGAA AAACTATTCA ATTCTGCCAT 1320 

TAGAGACACA TCCAATGCTC CCATCCCAAA 6GTTCAAAAG TTTTCAAATA ACTGTGGCAG 1380 

CTCACCATUVO GTGGGGGAAA GCATGATTAG TTTGCAGGTT ATGGTAGGAG AGGGTGAGAT 1440 

ATAAGACATA CATACTTTAG ATTTTAAATT ATTAAAGTCA AAAATCCATA GAAAAGTATC 1500 

CCTTTTTTTT TTTTTTGAGA CGGGTTCTCA CTATGTTGCC CAGGGCTGGT CTTGAACTCC 1560 

TATGCTCAAG TGATCCTCCC ACCTCGGCCT CCCAAAGTAC TGTGATTACA AGCGTGAGCC 1620 

ACGGCACCTG GGCAGAAAAG TATCTTAATT AATGAAAGAG CTAAGCCATC AAGCTGGGAC 1680 
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TTAATTGGAT TTAACATAGG TTCACAGAAA GTTTCCTAAC CAGAGCATCT TTTTGACCAC 1740 

TCAGCAAAAC TTCCACAGAC ATCCTTCTGG ACTTAAACAC TTAACATTAA CCACATTATT 1800 

AATTGTTGCT GAGTTTATTC CCCCTTCTAA CTGATGGCTG GCATCTGATA TGCAGAGTTA 1860 

GTCAACAGAC ACTGGCATCA ATTACAAAAT CACTGCTGTT TCTGTGATTC AAGCTGTCAA 1920 

CACAATAAAA TCGAAATTCA TTGATTCCAT CTCTGGTCCA GATGTTAAAC GTTTATAAAA 1980 

CCGGAAATGT CCTAACAACT CTGTAATGGC AAATTAAATT GTGTGTCTTT TTTGTTTTGT 2040 

CTTTCTACCT GATGTGTATT CAAGCGCTAT AACACGTATT TCCTTGACAA AAATAGTGAC 2100 

AGTGAATTCA CACTAATAAA TGTTCATAGG TTAAAGTCTG CACTGACATT TTCTCATCAA 2160 

TCACTGGTAT GTAAGTTATC AGTGACTGAC AGCTAGGTGG ACTGCCCCTA GGACTTCTGT 2220 

TTCACCAGAG CAGGAATCAA GTGGTGAGGC ACTGAATCGC TGTACAGGCT GAAGACCTCC 2280 

TTATTAGAGT TGAACTTCAA AGTAACTTGT TTTAAAAAAT GTGAATTACT GTAAAATAAT 2340 

CTATTTTGGA TTCATGTGTT TTCCAGGTGG ATATAGTTTG TAAACAATGT GAATAAAGTA 2400 

TTTAACATGT TCAAAAAAAA AAAAAAAA 2428 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 265 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



Met Pro Ser Leu Pro His Ser His Arg Val Met Leu Asp Ser Val Thr 

15 10 15 

His Ser Thr Phe Leu Pro Asn Ala Ser Phe Cys Asp Pro Leu Met Ser 

20 25 30 

Trp Thr Asp Leu Phe Ser Asn Glu Glu Tyr Tyr Pro Ala Phe Glu His 

35 40 45 

Gin Thr Ala Cys Asp Ser Tyr Trp Thr Ser Val His Pro Glu Tyr Trp 

50 55 60 

Thr Lys Arg His Val Trp Glu Trp Leu Gin Phe Cys Cys Asp Gin Tyr 
65 70 75 80 

Lys Leu Asp Thr Asn Cys lie Ser Phe Cys Asn Phe Asn lie Ser Gly 

85 90 95 

Leu Gin Leu Cys Ser Met Thr Gin Glu Glu Phe Val Glu Ala Ala Gly 

100 105 110 

Leu Cys Gly Glu Tyr Leu Tyr Phe lie Leu Gin Asn lie Arg Thr Gin 

115 120 125 

Gly Tyr Ser Phe Phe Asn Asp Ala Glu Glu Ser Lys Ala Thr lie Lys 

130 135 140 

Asp Tyr Ala Asp Ser Asn Cys Leu Lys Thr Ser Gly lie Lys Ser Gin 
145 150 155 160 

Asp Cys His Ser His Ser Arg Thr Ser Leu Gin Ser Ser His Leu Trp 

165 170 175 

Glu Phe Val Arg Asp Leu Leu Leu Ser Pro Glu Glu Asn Cys Gly lie 

180 185 190 

Leu Glu Trp Glu Asp Arg Glu Gin Gly lie Phe Arg Val Val Lys Ser 

195 200 205 

Glu Ala Leu Ala Lys Met Trp Gly Gin Arg Lys Lys Asn Asp Arg Met 

210 215 220 

Thr Tyr Glu Lys Leu Ser Arg Ala Leu Arg Tyr Tyr Tyr Lys Thr Gly 
225 230 235 240 
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lie Leu Glu Arg Val Asp Arg Arg Leu Val Tyr Lys Phe Gly-Lys Asn 

245 250 255 

Ala His Gly Trp Gin Glu Asp Lys Leu 
260 265 



(2) INFORMATION FOR SEQ ID NO: 8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2280 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



CTGGGAGCGC CTGCCTTCTC TTGCCTTGAA AGCCTCCTCT TTGGACCTAG CCACCGCTGC 60 

CCTCACGGTA ATGTTGGACT CGGTGACACA CAGCACCTTC CTGCCTAATG CATCCTTCTG 120 

CGATCCCCTG ATGTCGTGGA CTGATCTGTT CAGCAATGAA GAGTACTACC CTGCCTTTGA 180 

GCATCAGACA GCCTGTGACT CATACTGGAC ATCAGTCCAC CCTGAATACT GGACTAAGCG 240 

CCATGTGTGG GAGTGGCTCC AGTTCTGCTG CGACCAGTAC AAGTTGGACA CCAATTGCAT 300 

CTCCTTCTGC AACTTCAACA TCAGTGGCCT GCAGCTGTGC AGCATGACAC AGGAGGAGTT 360 

CGTCGAGGCA GCTGGCCTCT GCGGCGAGTA CCTGTACTTC ATCCTCCAGA ACATCCGCAC 420 

ACAAGGTTAC TCCTTTTTTA ATGACGCTGA AGAAAGCAAG GCCACCATCA AAGACTATGC 480 

TGATTCCAAC TGCTTGAAAA CAAGTGGCAT CAAAAGTCAA GACTGTCACA GTCATAGTAG 540 

AACAAGCCTC CAAAGTTCTC ATCTATGGGA ATTTGTACGA GACCTGCTTC TATCTCCTGA 600 

AGAAAACTGT GGCATTCTGG AATGGGAAGA TAGGGAACAA GGAATTTTTC GGGTGGTTAA 660 

ATCGGAAGCC CTGGCAAAGA TGTGGGGACA AAGGAAGAAA AATGACAGAA TGACGTATGA 720 

AAAGTTGAGC AGAGCCCTGA GATACTACTA TAAAACAGGA ATTTTGGAGC GGGTTGACCG 780 

AAGGTTAGTG TACAAATTTG GAAAAAATGC ACACGGGTGG CAGGAAGACA AGCTATGATC 840 

TGCTCCAGGC ATCAAGCTCA TTTTATGGAT TTCTGTCTTT TAAAACAATC AGATTGCAAT 900 

AGACATTCGA AAGGCTTCAT TTTCTTCTCT TTTTTTTTAA CCTGCAAACA TGCTGATAAA 960 

ATTTCTCCAC ATCTCAGCTT ACATTT6GAT TCAGAGTTGT TGTCTACGGA GGGTGAGAGC 1020 

AGAAACTCTT AAGAAATCCT TTCTTCTCCC TAAGGGGATG AGGGGATGAT CTTTTGTGGT 1080 

GTCTTGATCA AACTTTATTT TCCTAGAGTT GTGGAATGAC AACAGCCCAT GCCATTGATG 1140 

CTGATCAGAG AAAAACTATT CAATTCTGCC ATTAGAGACA CATCCAATGC TCCCATCCCA .1200 

AAGGTTCAAA AGTTTTCAAA TAACTGTGGC AGCTCACCAA AGGTGGGGGA AAGCATGATT 1260 

AGTTTGCAGG TTATGGTAGG AGAGGGTGAG ATATAAGACA TACATACTTT AGATTTTAAA 1320 

TTATTAAAGT CAAAAATCCA TAGAAAAGTA TCCCTTTTTT TTTTTTTTGA GACGGGTTCT 1380 

CACTATGTTG CCCAGGGCTG GTCTTGAACT CCTATGCTCA AGTGATCCTC CCACCTCGGC 1440 

CTCCCAAAGT ACTGTGATTA CAAGCGTGAG CCACGGCACC TGGGCAGAAA AGTATCTTAA 1500 

TTAATGAAAG AGCTAAGCCA TCAAGCTGGG ACTTAATTGG ATTTAACATA GGTTCACAGA 1560 

AAGTTTCCTA ACCAGAGCAT CTTTTTGACC ACTCAGCAAA ACTTCCACAG ACATCCTTCT 1620 

GGACTTAAAC ACTTAACATT AACCACATTA TTAATTGTTG CTGAGTTTAT TCCCCCTTCT 1680 

AACTGATGGC TGGCATCTGA TATGCAGAGT TAGTCAACAG ACACTGGCAT CAATTACAAA 1740 

ATCACTGCTG TTTCTGTGAT TCAAGCTGTC AACACAATAA AATCGAAATT CATTGATTCC 1800 

ATCTCTGGTC CAGATGTTAA ACGTTTATAA AACCGGAAAT GTCCTAACAA CTCTGTAATG 1860 

GCAAATTAAA TTGTGTGTCT TTTTTGTTTT GTCTTTCTAC CTGATGTGTA TTCAAGCGCT 1920 

ATAACACGTA TTTCCTTGAC AAAAATAGTG ACAGTGAATT CACACTAATA AATGTTCATA 1980 

GGTTAAAGTC TGCACTGACA TTTTCTCATC AATCACTGGT ATGTAAGTTA TCAGTGACTG 2040 

ACAGCTAGGT GGACTGCCCC TAGGACTTCT GTTTCACCAG AGCAGGAATC AAGTGGTGAG 2100 

GCACTGAATC GCTGTACAGG CTGAAGACCT CCTTATTAGA GTTGAACTTC AAAGTAACTT 2160 

GTTTTAAAAA ATGTGAATTA CTGTAAAATA ATCTATTTTG GATTCATGTG TTTTCCAGGT 2220 

GGATATAGTT TGTAAACAAT GTGAATAAAG TATTTAACAT GTTCJ\AAAAA AAAAAAAAAA 2280 
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(2) INFORMATION FOR SEQ ID NO: 9: 



PCT/US98/01260 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 55 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



Met 


Leu 


Asp 


Ser 


Val 


Thr 


His 


Ser 


Thr 


Phe Leu Pro Asn Ala 


Ser 


Phe 


1 








5 










10 


15 






Asp 


Pro 


Leu 


Met 


Ser 




Thr 


Asp 


Leu Phe Ser Asn Glu 


Glu 


Tvr 
xy i 








20 










25 


30 








XT X. u 


Ala 


Phe 


Glu 


His 


Gin 


Thr 


Ala 


Cvs AsD Ser* Tvr Tm 


Thr 


Ser 






35 










40 




45 












Glu 


xyt 


Tiro 


Thr 


Lys 


Arcr 


His Val Tiro Glu Tm 


Leu 


Gin 




50 










55 






60 












Asp 


Gin 


A yi 


Lys 


Leu 


Asp 


Thr Asn Cva Tie Seir 


Phe 


Cys 


65 










70 








75 




80 


Asn 


Phe 


Asn 


He 


Ser 


Glv 


Leu 


Gin 


Leu 


Cys Ser Met Thr Gin 


Glu 


Glu 










85 










90 


95 












Ala 




Leu 


\-y» 


Glv 


Gill TvK T.011 Tw 

ajcu Xjfx jTi&B 


He 










100 










105 


110 






Gin 


Asn 


lie 


Arg 


Thr 


Gin 


Gly 


Tyr 


Ser 


Phe Phe Asn Asp Ala 


Glu 


Glu 






115 










120 




■ 125 






Ser 


Lys 


Ala 


Thr 


He 


Lys 


Asp 


Tyr 


Ala 


Asp Ser Asn Cys Leu 


Lys 


Thr 




130 










135 






140 






Ser 


Gly 


He 


Lys 


Ser 


Gin 


Asp 


Cys 


His 


Ser His Ser Arg Thr 


Ser 


Leu 


145 










150 








155 




160 


Gin 


Ser 


Ser 


His 


Leu 


Trp 


Glu 


Phe 


Val 


Arg TVsp Leu Leu Leu 


Ser 


Pro 










165 










170 


175 




Glu 


Glu 


Asn 


Cys 


Gly 


He 


Leu 


Glu 


Trp 


Glu Asp Arg Glu Gin 


Gly 


He 








180 










185 


190 






Phe 


Arg 


Val 


Val 


Lys 


Ser 


Glu 


Ala 


Leu 


Ala Lys Met Trp Gly 


Gin 


Arg 






195 










200 




205 






Lys 


Lys 


Asn 


Asp 


Arg 


Met 


Thr 


Tyr 


Glu 


Lys Leu Ser Arg Ala 


Leu 


Arg 




210 










215 






220 






Tyr 


Tyr 


Tyr 


Lys 


Thr 


Gly 


He 


Leu 


Glu 


Arg Val Asp Arg Arg 


Leu 


Val 


225 










230 








235 




240 


Tyr 


Lys 


Phe 


Gly 


Lys 


Asn 


Ala 


His 


Gly 


Trp Gin Glu Asp Lys 


Leu 












245 










250 


255 





(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2498 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
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GAGGGGCTGA CAGCGGCGTC CCTCGTCTGG GCAGCCTCCG CTCTGCCACT CTCCTCCCGT 60 

CCTGAGGATG GGACCCCCGG AAAAGCGGCC TCTGGAGGCC TGCCATGGCA CCCAGAGCAG 120 

CCATTTTCCT CCCAGTTCTG GGGCTTTGGA AGGAGCTTGC GGATGAGGAG AGGGAGCCTC 180 

CGCAGGGCTC TGGCTCCCCT CCAGGGGCCG AGGCCGCACA CAAAGCCGCT CTGTGGCCCA 240 

ATTACACCTA CTGGATAGGA TTGTTGAGGG GACCTGAGAA ACTTGAGACG ACAAGAACGC 300 

GTAGCGCCTC GGCTGGCTGA GGGTGCTGAG CCCTCGTGTT GTGTTCTCTC CAGCTTTCCC 360 

CGTGCCTCAG CCACTCTTCA CGTTCCATCT GTGCTCTGTG CTGACCCGCC TGTGACTCAT 420 

ACTGGACATC AGTCCACCCT GAATACTGGA CTAAGCGCCA TGTGTGGGAG TGGCTCCAGT 480 

TCTGCTGCGA CCAGTACAAG TTGGACACCA ATTGCATCTC CTTCTGCAAC TTCAACATCA 540 

GTGGCCTGCA GCTGTGCAGC ATGACACAGG AGGAGTTCGT CGAGGCAGCT GGCCTCTGCG 600 

GCGAGTACCT GTACTTCATC CTCCAGAACA TCCGCACACA AGGTTACTCC TTTTTTAATG 660 

ACGCTGAAGA AAGCAAGGCC ACCATCAAAG ACTATGCTGA TTCCAACTGC TTGAAAACAA 720 

GTGGCATCAA AAGTCAAGAC TGTCACAGTC ATAGTAGAAC AAGCCTCCAA AGTTCTCATC 780 

TATGGGAATT TGTACGAGAC CTGCTTCTAT CTCCTGAAGA AAACTGTGGC ATTCTGGAAT 84 0 

QGGAAGATAG GGAACAAGGA ATTTTTCGGG TGGTTAAATC GGAAGCCCTG GCAAAGATGT 900 

GGGGACAAAG GAAGAAAAAT 6ACAGAATGA CGTATGAAAA GTTGAGCAGA GCCCTGAGAT 960 

ACTACTATAA AACAGGAATT TTGGAGCGGG TTGACCGAAG GTTAGTGTAC AAATTTGGAA 1020 

AAAATGCACA CGGGTGGCAG GAAGACAAGC TATGATCTGC TCCAGGCATC AAGCTCATTT 1080 

TATGGATTTC TGTCTTTTAA AACAATCAGA TTGCAATAGA CATTCGAAAG GCTTCATTTT 1140 

CTTCTCTTTT TTTTTAACCT GCAAACATGC TGATAAAATT TCTCCACATC TCAGCTTACA 1200 

TTTGGATTCA GAGTTGTTGT CTACGGAGGG TGA6AGCAGA AACTCTTAAG AAATCCTTTC 1260 

TTCTCCCTAA GGGGATGAGG GGATGATCTT TTGTGGTGTC TTGATCAAAC TTTATTTTCC 1320 

TAGAGTTGTG GAATGACAAC AGCCCATGCC ATTGATGCTG ATCAGAGA/U^ AACTATTCAA 1380 

TTCTGCCATT AGAGACACAT CCAATGCTCC CATCCCAAAG GTTCAAAAGT TTTCAAATAA 1440 

CTGTGGCAGC TCACCAAAGG TGGGGGAAAG CATGATTAGT TTGCAGGTTA TGGTAGGAGA 1500 

GGGTGAGATA TAAGACATAC ATACTTTAGA TTTTAAATTA TTAAAGTCAA AAATCCATAG 1560 

AAAAGTATCC CTTTTTTTTT TTTTTGAGAC GGGTTCTCAC TATGTTGCCC AGGGCTGGTC 1620 

TTGAACTCCT ATGCTCAAGT GATCCTCCCA CCTCGGCCTC CCAAAGTACT GTGATTACAA 1680 

GCGTGAGCCA CGGCACCTGG GCAGAAAAGT ATCTTAATTA ATGAAAGAGC TAAGCCATCA 1740 

AGCTGGGACT TAATTGGATT TAACATAGGT TCACAGAAAG TTTCCTAACC AGAGCATCTT 1800 

TTTGACCACT CAGCAAAACT TCCACAGACA TCCTTCTGGA CTTAAACACT TAACATTAAC 1860 

CACATTATTA ATTGTTGCTG AGTTTATTCC CCCTTCTAAC TGATGGCTGG CATCTGATAT 1920 

GCAGAGTTAG TCAACAGACA GTGGCATCAA TTACAAAATC ACTGCTGTTT CTGTGATTCA 1980 . 

AGCTGTCAAC ACAATAAAAT CGAAATTCAT TGATTCCATC TCTGGTCCCA GATGTTAAAC 2040 

GTTTATAAAA CCGGAAATGT CCTAACAACT CTGTAATGGC AAATTAAATT GTGTGTCTTT 2100 

TTTGTTTTGT CTTTCTACCT GATGTGTATT CAAGCGCTAT AACACGTATT TCCTTGACAA 2160 

AJkATAGTGAC AGTGAATTCA CACTAATAAA TGTTCATAGG TTAAAGTCTG CACTGACATT 2220 

TTCTCATCAA TCACTGGTAT GTAAGTTATC AGTGACTGAC AGCTAGGTGG ACTGCCCCTA 2280 

GGACTTCTGT TTCACCAGAG CAGGAATCAA GTGGTGAGGC ACTGAATCGC TGTACAGGCT 2340 

GAAGACCTCC TTATTAGAGT TGAACTTCAA AGTAACTTGT TTTAAAAAAT GTGAATTACT 2400 

GTAAAATAAT CTATTTTGGA TTCATGTGTT TTCCAGGTGG ATATAGTTTG TAAACAATGT 2460 

GAATAAAGTA TTTAACATGT TCAAAAAAAA AAAAAAAA 2498 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 164 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
Met Thr Gin Glu Glu Phe Val Glu Ala Ala Gly Leu Cys Gly Glu Tyr 
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1 






5 










10 




.15 


Leu Tyr 


Phe 


He 


Leu 


Gin 


Asn 


He 


Arg 


Thr 


Gin 


Gly Tyr Ser Phe Phe 






20 










25 






30 


Asn Asp 


Ala 


Glu 


Glu 


Ser 


Lys 


Ala 


Thr 


He 


Lys 


Asp Tyr Ala Asp Ser 




35 










40 








45 


Asn Cys 


Leu 


Lys 


Thr 


Ser 


Gly 


He 


Lys 


Ser 


Gin 


Asp Cys His Ser His 


50 










55 










60 


Ser Arg 


Thr 


Ser 


Leu 


Gin 


Ser 


Ser 


His 


Leu 


Trp 


Glu Phe Val Arg Asp 


65 








70 










75 


80 


Leu Leu 


Leu 


Ser 


Pro 


Glu 


Glu 


Asn 


Cys 


Gly 


He 


Leu Glu Trp Glu Asp 








85 










90 




95 


Arg Glu 


Gin 


Gly 


He 


Phe 


Arg 


Val 


Val 


Lys 


Ser 


Glu Ala Leu Ala Lys 






100 










105 






110 


Met Trp 


Gly 


Gin 


Arg 


Lys 


Lys 


Asn 


Asp 


Arg 


Met 


Thr Tyr Glu Lys Leu 




115 










120 








125 


Ser Arg 


Ala 


Leu 


Arg 


Tyr 


Tyr 


Tyr 


Lys 


Thr 


Gly 


He Leu Glu Arg Val 


130 










135 










140 


Asp Arg 


Arg 


Leu 


Val 


Tyr 


Lys 


Phe 


Gly 


Lys 


Asn 


Ala His Gly Trp Gin 


145 








150 










155 


160 


Glu Asp 


Lys 


Leu 



















(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUEarCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AAATGAGCCA ATGTTTGTAA T 21 • 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
AAATGAGCCA GTGTTTGTAA T 21 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 736 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 



PCTAJS98/01260 



(ii) MOLECULE TYPE: Genomic DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



AGGAAGTGAA GAACCTAGAT AATCCACCAA CCGGATAATC AGCTCTTGCA TATTTGAGAG 60 

TTGACTGCTT GACCTAAGCA TCTCCTCATA AGGTACCCTC CCTCCCAGGA CCTTCCCTTT 120 

CAAACCTCTC AAGGCTCTTA CCTGGGGCCA GGGGAGATAG GCTTTTCAAA GTCCATTGAA 180 

TTGCCAAGAG TCTCTGTCAA GAAGGCAGTC ATGGTGCCTG GAGAGGGAAC TTGCTGGGAG 240 

CCCCTTCAGA GCCTGGTACT TATAGAQCTA GGGAAAAGAT CTTGATGCCA AAGCAGGGTG 300 

GACTAAATAC AGACTAATAA ATGAGACAGG TGCTCAAGAG GGCCCCTCCA TACCATCATC 360 

TCCTCCAGAT TTGGACTTCT ACTCACTTTG CTTTTACATT CCCTCTTCCC GATGGTGTCT 420 

TTGGTGAGCA GGGTGCTTTT CACCTGAAAC AGCCTCTGAG CTGAAAAGAA CAGTCACCAC 480 

CAAATCAATT CCTCATCCAT TAACAGGTTG TCTCTCTGTT CTTGAGACAC AGGCATTACC 540 

TGGTTAGACC TGTTTTGTTT GAACACTAAC GTGTGAGTTG GCCAAATGCA AATGAGCCAA 600 

TGTTTGTAAT CCTTTATTTT ATTTTTTTAA AGGGCTGGGT AGCCAATCAG AAGAGGGGGA 660 

AGTGACTTAG GGAATTCCCG GTTGGTGGCT TATTGCTTAA CATCCTACAA AATGATTTAA 720 

AATTATTGTT ATATGC 736 



(2) INFORMATION FOR SEQ ID NO: 15: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : douible 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



GCCAGAGTCC TCCTTGAGAA CTTACAATGT GTCCATATTA AGGATCTGCT GTGTTTGATG 60 

ATTTTGTGAT TACACTTTAA ACTTCTTATC CATAAAGGAC ATACTTGATA TATCTGAGAC 120 

TTGTAGTAGA AGGCCTTGAG ACATCCATCT CATCCCATCA TTATCTATCT ATCATCTATC 180 

TATCTATCTA TCTATCTATC TATCTATCTA TCTATCATCT ATCTATCTAT CGCCAGTACT 240 

GTCTTGTTGA AGTTGGCAGT AGGGTGAAAG ACCTCAAACT CCAAAGGACT TTCCGTATGG 300 

ATGCAATATA CCTGCAATTC TAGCTTTTCT GTG 333 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingl e 
(b) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

ACAGAATGAC RTATGAAAAG T 21 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQXreiNCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
GTAACCAAGC KCAAGCCACC C 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
AAGGAGCCCA YCTGAGTGCA G 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
CGTTCCATCT STGCTCTGTG C 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
. (D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:20 
AGCGCCTCGG YTGGCTGAGG G 

(2) INFORMATION FOR SEQ ID N0:21: 

(i) SEQUENCE CHTOIACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
TGTATTCAAG YGCTATAACA C 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
CACTGAGAAG CCNACAGGCC T6T 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
CCCACAGGCC WGTCCCTCCA A 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
CGTCCATCTC YAGCTCCAGG G 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25 
GACTTGATAA YGCCCGTGGT G 
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(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26 
ACTTGATAAC RCCCGTGGTG C 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
CTCCCCTCCA WGAGCCACAG C 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
ATTTCCTGCA TNGTCTGGAC TT 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 

ATCCAAACAC YTGAGTGGAA A 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
AGTTTCCTCA RTGCGGGAGC T 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
GCGAGCACCT YTGCAGCATG A 

(2). INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
TTCACCCGGG YGGCAGGGAC G 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
CTGGGGAAAA NNGATCGCTG AC 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 

GTCAATTAAA YGGCTCTCAT T 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
TAGATCATTC RTAACCTGCC T 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
AAAGAGAAAT WCTG6AGCGT 6 

(2) INFORMATION FOR SEQ ID NO:37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
ATGAGGG6AA MAAGAAACTA C 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
TTTTGTATGT KACATGATTT A 
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(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
AGCTTGGTTC YTTTTTGCTC C 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
TTGACACCAG RAACCCCCCA G 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQX7ENCE DESCRIPTION: SEQ ID NO: 41 
AAATGAGCCA RTGTTTGTAA T 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 

ATCCATTTTG YATTCCTCAT T 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
CTGGAGCTCA RACCAGACAG C 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 
GCCAGTGCAG SCATCATTAC C 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 
AGTTCAAATC RTAATTTTTA T 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46 
TCATCAGAAT YTA/^TCTCC C 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47 

GGAGATTCAG NTGAAGCAAG A 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48 
TTTTTCCACA YCCAGCCTGG C 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49 
CCCAGCCTGG YGAACCCTGG C 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50 
CTCTTCATCA YGGTCAAATA C 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51 
CAACTTGCTG YCAAAGTGCT G 
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(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52 
TACTATGTGC YAGATACTAA G 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53 
ATGCCACTTT RRGACAACTT GAG 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54 
CGCATGCCTG KAAAGAAGAG A 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHT^CTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55 

GGATAAGCAC MAGTGAGCCT G 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56 
AAAGCCAGAC RGCAACTTGT G 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57 
TCTCAAAAAG RGTGATAGGA G 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58 
TCTGAATCCT STCTCCTCCT T 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59 
TAGAACCAGG WTGTGGGACC A 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60 
TTCTTGTGTC RGGCGCAAAA C 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61 
AACCAACATG RAGAAACCCC A 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62 
AATAAACTAT RGTTCACCTA G 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63 
ACATATTTGT RTCTCATATG A 

(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64 
CPJAGCAGTT YCTAATAATC C 
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(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65 
AGATCCTAAC YGGGGCCTCC T 

(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66 
CTCTTTCTCT YTGCTTCCTC C 

(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67 
TTAGGAATCC WCAAATATGT A 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68 

GTCTGACTCC RCCTCCCTCA T 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
GAATCACATC RTGAGAAATG T 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

AATTCAATCC YTCACAGACT T 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
GTGTAGCCAG RGTTGCTAAT T 

(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
CCTAGAAATA SCCAAGGGCA C 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73 
AAATTCTCAT RCCTCACCCT C 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74 
TCCCACCCCT RTCACCTTCA T 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75 
CCTCATTCTC RGAAGCCAAC A 

(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76 
GAAGAGCCGT YCAGTCCCTT T 

(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 77 
TCCATAGGCT YTTTATTTGG C 
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(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78 
TCGTTTAGTA YACAGGCTTT G 

(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79 
GCCTCAGTTG YCCCAGCTAT A 

(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80 
AGCAAAATGC WCTATGCACT G 

(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81 

GTGTCCTGAC NNNNNNNNNN NACACTGCCT G 

(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82 
ATCAGATAAC RCCTACACTT A 

(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83 
TCTCTCTTCT SCCTGCCCTG T 

(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84 
TGGACACAGG KAGGGGAATA T 

(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85 
TGTCACrrGC RCATACAAGG C 

(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86 
ATCATCAGAT YAGCCCAGAA T 

(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQtJENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87 
TCAACAGAGA RAGTTAATGG T 

(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRAITOEDNESS : single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88 
AGCAATAATG YTTCCCTTTT C 

(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89 
TCTAGCTTTT YTGTGTTTTT T 

(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90 
GATTCCTTAA YGCTTGATAC T 
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C2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNHSS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91 
CCTCCTCCAG YACCAAAGTG G 

(2) INFORMATION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92 
ATGGCCACAG RTCAAATCCT G 

(2) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93 
ACTGAGTGTT YATGCCAATT T 

(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94 

GACAAGCCCT RTCTGACACA C 

(2) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95 
TGAAAAGCCT YCTTGCTGCC T 

(2) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96 

TCCTGGAGTT YCTTTGCTCC C 

(2) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97 
GATTCCAAAT WAACTAAAGA T 

(2) INFORMATION FOR SEQ ID NO: 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98 
GACCTCAAGT CRTCCACCCG CC 

(2) INFORMATION FOR SEQ ID NO: 99: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 
AACAAATACT MCCCCGCAAC CC 

(2) INFORMATION FOR SEQ ID NO: 100: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100 
ATTTTTTTTT NAAGGAAAAT A 

(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101 
AAATTTCCCC MAAACAAGCA G 

(2) INFORMATION FOR SEQ ID NO: 102: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102 
GAGT^GGGT RTGTGTGTGT G 

(2) INFORMATION FOR SEQ ID NO: 103: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103 
GTGTGTGTGT NNNNGTATGT GCGCGTG 
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(2) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRT^EDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104 
ATCGGGAACC YCATACCCCA A 

(2) INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105 
TTTGTTTCGC MATGAGGTAC G 

(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106 
TGAGGGTGTT STGGGCTGGA C 

(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107 

TCTTCATTGG YATCTGAATG T 

(2) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B> TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 108 
GCGAGCACCT YTGCAGCATG A 

(2) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:109 
AACCCCCCCC MCACACACAC A 

(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110 
TCAGTGCTCT STAATCAGTC A 

(2) INFORMATION FOR SEQ ID NO: 111: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111 
TCTTTGTGAA ANNAATTAGT CTG 

(2) INFORMATION FOR SEQ ID NO: 112: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112 
GCTGCCCTGA SAGCTGGGCC A 

(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(d) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113 
CCTTCTGATC YTTGTTTGCT G 

(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114 
GGAACACTGA KTCTTGATTA G 

(2) INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQXIBNCE DESCRIPTION: SEQ ID NO: 115 
TAGGCTTCTC YTGATAATTG A 

(2) INFORMATION FOR SEQ ID NO: 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116 
TCTTAAAATA MTTGGCTTGT A 
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(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117 
TAGATCATTA RTAACCTGCC T 

(2) INFORMATION FOR SEQ ID NO: 118: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118 
ATGA6GGGAA MAAGAAACTA C 

(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119 
TTGACACCAG RAACCCCCCA G 

(2) INFORMATION FOR SEQ ID NO: 120: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120 

TGTTTTAAAT RTTAGGGACA A 

(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121 
GTAAGCATAG YAATGTAGCA G 

(2) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122 
GGCTCTTTCT KCAACCTTTC C 

(2) INFORMATION FOR SEQ ID NO: 123: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:123 
GACCCAGGTT RTGAGTTTTC C 

(2) INFORMATION FOR SEQ ID NO: 124: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124 
GACAGAATGA YATATGAAAA G 

(2) INFORMATION FOR SEQ ID NO: 125: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125 
TGTGTGACAC YGAGAAGCCC A 

(2) INFORMATION FOR SEQ ID NO: 126: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126 
AGTACTGGAC MAAGTACCAG G 

(2) INFORMATION FOR SEQ ID NO: 127: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: doiible 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:127 
CCTGGGAGCA RGTATTGCAT T 

(2) INFORMATION FOR SEQ ID NO: 128: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:128 
AGATTTGAGG YCTCAGGTCC C 

(2) INFORMATION FOR SEQ ID NO: 12 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: doiible 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129 
TGTCAATGTC RCATGATAAG C 

(2) INFORMATION FOR SEQ ID NO: 130: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130 
TTGCCCCAGT KTTCTCCGGG C 

(2) INFORMATION FOR SEQ ID NO: 131: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:131 
TATGAGCAGC RTAGGGAGTG G 

(2) INFORMATION FOR SEQ ID NO: 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132 
AGTTGACTGA AAAANTAAAT AA6AC 

(2) INFORMATION FOR SEQ ID NO: 13 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:133 
ATTCAAATAG SCTCTAGAAA C 

(2) INFORMATION FOR SEQ ID NO: 134: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : doiible 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134 
CCCAGAATTT MATATCCATT C 

(2) INFORMATION FOR SEQ ID NO: 135: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135 
TGACCCAACA RAAACTCACT G 

(2) INFORMATION FOR SEQ ID NO: 136: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136 
CCAGAATATA WCATCAGCCC T 

(2) INFORMATION FOR SEQ ID NO: 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: doilble 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137 
CATCAGCCCT WCTGAGGAGA T 

(2) INFORMATION FOR SEQ ID NO: 138: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : doilble 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138 
CCAGAACAGA YTTTATTCTG T 

(2) INFORMATION FOR SEQ ID NO: 139: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139 
TTCAGCCATC YTTCCAOTTG T 

(2) INFORMATION FOR SEQ ID NO: 140: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140 
TCACTAACTC WAAAACGACA T 

(2) INFORMATION FOR SEQ ID NO: 141: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141 
AACTCAAAAA YGACATCCTC C 

(2) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142 
GAACTGCACA RGTTGCACAC T 

(2) INFORMATION FOR SEQ ID NO: 14 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:143 
TTGTTCCATG SACTACCTCC T 

(2) INFORMATION FOR SEQ ID NO: 144: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144 
ACAGCAGGCA YTCAACAAAT T 

(2) INFORMATION FOR SEQ ID NO: 145: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145 
TTATTTTT6G STTTGTTTTA A 

(2) INFORMATION FOR SEQ ID NO: 146: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146 
TAGGCTGTTC YCTGCCATCA C 

(2) INFORMATION FOR SEQ ID NO: 147: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: dotlble 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 7 
GTGCTCTGGG MCACACAGCT C 

(2) INFORMATION FOR SEQ ID NO: 148: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 8 
AGACCCGATA RGAGCTCCTT C 

(2) INFORMATION FOR SEQ ID NO: 14 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 9 
CATCTTGCGC RGTCATGTAA G 

(2) INFORMATION FOR SEQ ID NO: 150: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150 
CAGCACAGCT RTTCCCTCAA A 

(2) INFORMATION FOR SEQ ID NO: 151: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151 
TTTGGAAACA YGGTGAAGTA T 

(2) INFORMATION FOR SEQ ID NO: 152: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152 
ACACGGTGAA RTATTGTCTC C 

(2) INFORMATION FOR SEQ ID NO: 153: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 153 
AAAAGTGGAT MCTCTGCAAA C 

(2) INFORMATION FOR SEQ ID NO: 154: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154 
CTTCAAATGC RGCTATTAAA G 

(2) INFORMATION FOR SEQ ID NO: 155: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155 
CCTGGGAGCA YGGTAAATCA G 

(2) INFORMATION FOR SEQ ID NO: 156: 

(i) SEQXJENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156 
TGAAAATGTC RCTTTCTCAC CT 

(2) INFORMATION FOR SEQ ID NO: 157: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157 
CCTGATATTT RCC7ACAAGA A 

(2) INFORMATION FOR SEQ ID NO: 158: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158 
AAAGGGTTAG YTTGTCCCCT T 

(2) INFORMATION FOR SEQ ID NO: 159: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159 
TGAAAATAAA ASACAATTTT TT 

(2) INFORMATION FOR SEQ ID NO: 160: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 160 
CTGCTGTGGA CGAATAGG 

(2) INFORMATION FOR SEQ ID NO:161: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161 
TCAATATAAT CTTGCTTAAC TTGG 

(2) INFORMATION FOR SEQ ID NO: 162: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162 
GACCTGTTTG GGTTGATTTC AG 

(2) INFORMATION FOR SEQ ID NO: 163: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 163 
GTTTCTTACA GTGTCTTGCT ATCACATCAC C 

(2) INFORMATION FOR SEQ ID NO: 164: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 164 
GAGGACTGGC AGTACCAAGT AAAC 

(2) INFORMATION FOR SEQ ID NO: 165: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165 
GTTTCTTTGG TTCATTCTAA GATGGCTGG 
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(2) INFORMATION FOR SEQ ID NO: 166: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166 
GCTGAGGCAG GAGAAAA6AC AAG 

(2) INFORMATION FOR SEQ ID NO: 167: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 167 
GTTTCTTCAT GCAAAGGTCA GGAGGTAGG 

(2) INFORMATION FOR SEQ ID NO: 168: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168 
GTTGCTTCCA GACGAGGTAC ATG 

(2) INFORMATION FOR SEQ ID NO: 169: 

(i) SEQUENCE CHTUIACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 169 

GTTTCTTCAA TGGCTCCACA AACATCTCTG 

(2) INFORMATION FOR SEQ ID N0:170: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 170 
AGGTTTAGGG GACAGGGTTT GG 

(2) INFORMATION FOR SEQ ID NO: 171: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171 

GTTTCTTTCC TGGCTAACAC GGTGAAATC 

(2) INFORMATION FOR SEQ ID NO: 172: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172 
GTTTCTTATT GCCTCCTCCC AAAATTC 

(2) INFORMATION FOR SEQ ID NO: 173: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:173 
AGAG6CCACT GGAAGACGAA 

(2) INFORMATION FOR SEQ ID NO: 174: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174 
AACTGGAGTC AGGCAAAACG TG 

(2) INFORMATION FOR SEQ ID NO: 175: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 175 
GTTTCTTTGG CTGGTAAGGA AAGAAACCAC 

(2) INFORMATION FOR SEQ ID NO: 176: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:176 
GGCTAGGTTC ATAAACTCTG TGCTG 

(2) INFORMATION FOR SEQ ID NO: 177: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:177 
GTTTCTTGAT TGTTTGAGAT CCTTGACCCA G 

(2) INFORMATION FOR SEQ ID NO: 178: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178 
GCCGAAATCA CAACACTGCA TC 
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(2) INFORMATION FOR SEQ ID NO: 179: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 179 
GTTTCTTGAT TCTGCTCTTA CTCTTGCCCC 

(2) INFORMATION FOR SEQ ID NO: 180: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 180 
GTAATAGAAC CAAAGGGCTG AGAC 

(2) INFORMATION FOR SEQ ID NO: 181: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 181 
GTTTCTTCGG AGTCAGACCT TACATTGTTG AG 

(2) INFORMATION FOR SEQ ID NO: 182: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182 

ATCTCCCTGC TACCCACCTT 

(2) INFORMATION FOR SEQ ID NO: 183: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183 
GTTTCTTGTT TTCAGTGAGT TTCTGTTGGG 

(2) INFORMATION FOR SEQ ID NO: 184: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184 
GTGTGCCAAA CAACATTTGC 

(2) INFORMATION FOR SEQ ID NO: 185: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185 
GTTTCTTCAA GCCATCAAGC TAGAGTGG 

(2) INFORMATION FOR SEQ ID NO:186: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 186 
GGGCTTTTAA ACCCTTATTT AACC 

(2) INFORMATION FOR SEQ ID NO: 187: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 187 
GTTTCTTAGG TGATCTCAGA GCCACTCA 

(2) INFORMATION FOR SEQ ID NO: 188: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 188 
AGGGCAGGTG GGAACTTACT 

(2) INFORMATION FOR SEQ ID NO: 189: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 189 
GTTTCTTTGG AGTCAGTTGA GCTTTCTACC 

(2) INFORMATION FOR SEQ ID NO: 190: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190 
TGAACTTGCC TACCTCCCAG 

(2) INFORMATION FOR SEQ ID NO: 191: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 191 
GTTTCTTAGG ATATATCCTT ACACAAGCAC A 
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(2) INFORMATION FOR SEQ ID NO: 192: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 192 
CATGGTTCCA AAGGCAAGTT 

(2) INFORMATION FOR SEQ ID NO: 193: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 193 
GTTTCTTTTG AGGCTGAATG AGCT6TG 

(2) INFORMATION FOR SEQ ID NO: 194: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194 
ACAGGTGGGA AGACTGAATG TC 

(2) INFORMATION FOR SEQ ID NO:195: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195 

GTTTCTTGCA GTACACATCA CATGACCTTG 

(2) INFORMATION FOR SEQ ID NO: 196: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQtJBNCE DESCRIPTION: SEQ ID NO: 196 
GAAATAGGCG GAAACTGGTT C 

(2) INFORMATION FOR SEQ ID NO: 197: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 197 
GTTTCTTCGT TGTGGTTGTT CAGAAAGG 

(2) INFORMATION FOR SEQ ID NO: 198: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 198 
GGTCAAGTGT TCAGAACGCA TC 

(2) INFORMATION FOR SEQ ID NO: 199: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 199 
GTTTCTTGCA GGGATTATGC TA(3GTCTGTA G 

(2) INFORMATION FOR SEQ ID NO: 200: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:200 
AGCACTTCTG AGGAAGGGAC AC 

(2) INFORMATION FOR SEQ ID NO: 201: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201 
GTTTCTTAGG GCAGGCAGAC ATACAAAC 

(2) INFORMATION FOR SEQ ID NO: 2 02: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 02 
GCCAATGTGT TCCTAGAGCG AC 

(2) INFORMATION FOR SEQ ID NO: 2 03: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 03 
GTTTCTTTTA AAGGGGGTA6 GGTGTCACC 

(2) INFORMATION FOR SEQ ID NO: 204: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 04 
GGAAGGGAAA AGGACAAGGT TTTG 
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(2) INFOR^aATION FOR SEQ ID NO: 205: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRAMDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 205 
GTTTCTTAGC AAGAGCACTG GTGTAGGAGT C 

(2) INFORMATION FOR SEQ ID NO: 2 06: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNHSS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 206 
GCTTTTCAAG CACTTGTCTC 

(2) INFORMATION FOR SEQ ID NO: 207: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207 
TGGGATTGTG ACTTACCATG 

(2) INFORMATION FOR SEQ ID NO:208: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 208 

ACTTGGTGTC TTATAGAAAG GTG 

(2) INFORMATION FOR SEQ ID N0:209: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
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(B) T5fPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 209 
GTTTCTTAGC TGTGTTTGCT GCATC 

(2) INFORMATION FOR SEQ ID NO: 210: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 210 
AGATGTGTGA TGAGATGCAG 

(2) INFORMATION FOR SEQ ID NO: 211: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 211 
GTTTCTTCAA ATAGTGCAAC AAACCC 

(2) INFORMATION FOR SEQ ID NO: 2 12: 

( i ) SEQUENCE . CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212 
TGTCATTCTG AAAGTGCTTC C 

(2) INFORMATION FOR SEQ ID NO: 2 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 213 
GTTTCTTCTG TAACTAACGA TCTGTAGTGG TG 

(2) INFORMATION FOR SEQ ID NO: 214: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 14 
TATCAAGGTA ATATAGTAGC CACGG 

(2) INFORMATION FOR SEQ ID NO: 215: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2X5 
AGGTCTTTCA TGCAGAGTGG 

(2) INFORMATION FOR SEQ ID NO: 2 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 216 
ATTGCCAAAA CTTGGAAGC 

(2) INFORMATION FOR SEQ ID NO:217: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 17 
AGGTGACATA TCAA6ACCCT G 
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(2) INFORMATION FOR SEQ ID NO: 218: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQtJENCE DESCRIPTION: SEQ ID NO:218 
TTGTCAACGA AGCCCAC 

(2) INFORMATION FOR SEQ ID NO: 219: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 219 

GTTTCTTGCA AGATTGTGTG TATGGATG 

(2) INFORMATION FOR SEQ ID NO: 220: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
IB) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220 
GCTCTCTATG TGTTTGGGTG 

(2) INFORMATION FOR SEQ ID NO: 221: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221 

AAGAGTACGC TAGTGGATGG 

(2) INFORMATION FOR SEQ ID NO: 222: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 222 
TCCATTAGAC CCAGAAAGG 

(2) INFORMATION FOR SEQ ID NO: 223: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:223 
GTTTCTTCAC CAGGCTGAGA TGTTACT 

(2) INFORMATION FOR SEQ ID NO: 224: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 224 
AATCGTTCCT TATCAGGTAA TTTGG 

(2) INFORMATION FOR SEQ ID NO: 225: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 225 
GTTTCTTCAA AGAAAGCAAT TCCATCATAA CA 

(2) INFORMATION FOR SEQ ID NO: 226: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



-159- 



wo 99/37809 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:226 
GCATTTGTTG AAGCAAGCGG 

(2) INFORMATION FOR SEQ ID NO:227: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:227 
CTTTGTTCCT TGGCTGATGG 

(2) INFORMATION FOR SEQ ID NO: 228: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:228 
AATAGTACCA GACACACGTG 

(2) INFORMATION FOR SEQ ID NO: 229: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 229 
CAATGGTTCA CAGCCCTTTT 

(2) INFORMATION FOR SEQ ID NO: 230: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 230 
AGCCTGGGAG ACAGAGTGAG 
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(2) INFORMATION FOR SEQ ID NO: 231: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:231 
GTTTCTTGCA CTTTTTGGGQ AAGGTG 

(2) INFORMATION FOR SEQ ID NO: 2 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232 
GTTCCTCCCT TCCCTCTCC 

(2) INFORMATION FOR SEQ ID NO: 23 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 233 
GTTTCTTTCA GGGACTGGAT TGTAG 

(2) INFORMATION FOR SEQ ID NO:234: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 234 

GTGTTCTTTA TGTGTAGTTC 

(2) INFORMATION FOR SEQ ID NO: 23 5: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 26 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235 
GTTTCTTGGC AACAGAGTGA GACTCA 

(2) INFORMATION FOR SEQ ID NO: 236: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 236 
GTGACATCCA GTGTTGGGAG 

(2) INFORMATION FOR SEQ ID NO: 237: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 237 
GTTTCTTCCT AAGCAAGCAA GCAATCA 

(2) INFORMATION FOR SEQ ID NO: 238: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 238 
AAAGGCAATT G6TG6ACA 

(2) INFORMATION FOR SEQ ID NO: 23 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 239 
GTTTCTTTTC AATCCTTGAT GCAAAGT 

(2) INFORMATION FOR SEQ ID NO:240: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240 
GGTGACAGAG CAAGATTTCG 

(2) INFORMATION FOR SEQ ID NO:241: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 241 
GTTTCTTGTA GAGTTGAGGG AGCAGC 

(2) INFORMATION FOR SEQ ID NO: 242: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:242 
CATCCATCTC ATCCCATCAT 

(2) INFORMATION FOR SEQ ID NO: 24 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 243 
GTTTCTTTTC ACCCTACTGC CAACTTC 
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(2) INFORMATION FOR SEQ ID NO: 244: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 244 
CCGCCATTTT AGAGAGCATA 

(2) INFORMATION FOR SEQ ID NO: 245: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 245 
GTTTCTTTTC TGGGACAATT GGTAGGA 

(2) INFORMATION FOR SEQ ID NO: 24 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 246 
TTTGTGTTAT TATTTCAGGT GC 

(2) INFORMATION FOR SEQ ID NO: 247: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 247 

GTTTCTTGTT TTTTGTTTCA GTTTAGGAAC 

(2) INFORMATION FOR SEQ ID NO: 248: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:248 
CATACCCAAA TCGTTCTCTT CCTC 

(2) INFORMATION FOR SEQ ID NO: 249: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 9 
GTTTCTTGGA AAAGCAAAGG CATCGTAGAG 

(2) INFORMATION FOR SEQ ID NO: 250: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 250 
TACTAACCAA AAGAGTTGGG G 

(2) INFORMATION FOR SEQ ID NO: 251: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 251 
CTATCATTCA GAAAATGTTG GC 

(2) INFORMATION FOR SEQ ID NO: 252: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 252 
GTATGGCAGT AGAGGGCATG 

(2) INFORMATION FOR SEQ ID NO: 253: 

(i) SEQXJENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 253 
AAGGTTACAT TTCAAGAAAT AAAGT 

(2) INFORMATION FOR SEQ ID NO: 254: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 54 
CTGTTCAGGC CTCAATATAT ACC 

(2) INFORMATION FOR SEQ ID NO: 2 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:255 
AAGA6GATAG 6TGGGGTTTG 

(2) INFORMATION FOR SEQ ID NO:256: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 256 
CCTCCCACCT AGACACAAT 
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(2) INFORMATION FOR SEQ ID NO: 257: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 57 
ATATGATCTT TGCATCCCTG 

(2) INFORMATION FOR SEQ ID NO: 258: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 258 
AAGAAAGACC TGGAAGGAAT 

(2) INFORMATION FOR SEQ ID NO: 259: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:259 
A7ACAGCAAA ACCTCATCTC 

(2) INFORMATION FOR SEQ ID NO: 260: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 260 

CCACCACTTA TTACCTGCAT 

(2) INFORMATION FOR SEQ ID NO: 261: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261 
TGAATGAATG AATGAACGAA 

(2) INFORMATION FOR SEQ ID NO: 262: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 262 

AACTGTGATT GTGCCACTGC ACTC 

(2) INFORMATION FOR SEQ ID NO: 26 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 263 
GTTTCTTCAC CGCCTTTATC CCTCAAATG 

(2) INFORMATION FOR SEQ ID NO: 2 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 264 
GATGGGTGGA GGGCAGTTAA AG 

(2) INFORMATION FOR SEQ ID NO: 265: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 265 
GTCAAGCAAC TTGTCCAAGG CTAC 

(2) INFORMATION FOR SEQ ID NO: 266: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 266 
CAGGCTATCA GTTTCCTTTG GAG 

(2) INFORMATION FOR SEQ ID NO: 26 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:267 
GGCAGGTAAT ACTGGAGAAT TAGG 

(2) INFORMATION FOR SEQ ID N0:268: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 268 
GACGGATCTC AGAGCCACTC 

(2) INFORMATION FOR SEQ ID NO: 26 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:269 
GTTTCTTAAA AGATAAGGGC TTTTAAACC 
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(2) INFORMATION FOR SEQ ID NO: 270: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:270 
AGTTTCACAG CTTGTTATGG 

(2) INFORMATION FOR SEQ ID NO: 271: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(D> TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 271 
GGTTGATGAA GTGAGACTTT 

(2) INFORMATION FOR SEQ ID NO: 272: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 72 
ATGGTGGATG CATCCTGTG 

(2) INFORMATION FOR SEQ ID NO: 273: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 273 

GTTTCTTGTA TTGACTCCTC CTCTGC 

(2) INFORMATION FOR SEQ ID NO: 274: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 10 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 274 
CAGTAAACAT 

(2) INFORMATION FOR SEQ ID NO: 2 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 275 
TGTTGAGTGG 

(2) INFORMATION FOR SEQ ID NO: 276.: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 76 
TCTCCTCTAT GTGCATGT 

(2) INFORMATION FOR SEQ ID N0:277: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:277 

ATTCTACATA 

(2) INFORMATION FOR SEQ ID NO: 278: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 10 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:278 
GTGTTTGCAT 

(2) INFORMATION FOR SEQ ID NO: 279: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:279 
ACAAGTTGGC 

(2) INFORMATION FOR SEQ ID NO:280: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 280 
TAGTACCAGA 

(2) INFORMATION FOR SEQ ID NO: 2 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 281 

TACATCCAAG AAAA 

(2) INFORMATION FOR SEQ ID NO:282: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 282 
GAGACTCTGA CAAATATATA TA 

(2) INFORMATION FOR SEQ ID NO: 283: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 283 
TGTTGATCGC CAAACCAAAA TC 

(2) INFORMATION FOR SEQ ID NO: 284: 

(i) SEQUENCE CH/UIACTERISTICS : 

(A) LENGTH: 3 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 284 
AATGCATGTA TGTATATGGT GTGGTATGTG TACATATG 

(2) INFORMATION FOR SEQ ID NO: 28 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 285 
CCTCCCAGAA CAATCATGAT AA 

(2) INFORMATION FOR SEQ ID NO: 286: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 86 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 286: 



AGACAGTCTC AAAAAATATT TTAAAGAAAA AGCTGGATAA ATAACTAGCT TTAAGAAAAT 
AAGAAGAAAA AGAAAGAAGA AAGTAA 



60 
86 



(2) INFORMATION FOR SEQ ID NO: 28 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 86 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 287: 

AACTAGCTTT AAGAAAATAA GAAGAAAAAG AAAGAAGAAA GTAAGAAAGA GAAAGAAAAG 60 
AAAGAAAAGA AAGAGGAATG ATTGAC 86 

(2) INFORMATION FOR SEQ ID NO: 288: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 288: 
CGCGCACATA CACCCTTTCT CT 22 
(2) INFORMATION FOR SEQ ID NO: 289: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:289: 
CAGTAAACAT CATGTTGAGT GG 22 
(2) INFORMATION FOR SEQ ID NO: 290: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:290: 
TCTCCTCAAT GTGCATGTGT GCATGAGTGC ACATTCTACA TA 
(2) INFORMATION FOR SEQ ID NO: 291: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 291: 
GTGTTTGCAT GTTGTACAAG TTGGC 

(2) INFORMATION FOR SEQ ID NO: 2 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 292: 
TAGTACCAGA CACGTGCAGG CAAGCGCACC ATACATCCAA GAAAA 
(2) INFORMATION FOR SEQ ID NO: 293: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 293: 
GGAGGCTGAG CAGGGGTGCC 

(2) INFORMATION FOR SEQ ID NO: 2 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 294: 
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ACTCCCACAG GTACCTGCAG 

(2) INFORMATION FOR SEQ ID NO: 295: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 295 
CTGCCCTCAC GTMGCGCCT 

(2) INFORMATION FOR SEQ ID NO: 296: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 296 
GCTGTTGCAG G6TAATGTTG 

(2) INFORMATION FOR SEQ ID NO: 2 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 297 
CATCAGACAG GTGCGTACA 

(2) INFORMATION FOR SEQ ID NO: 2 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 298 
GGCTGGTGAG GAGGGGCTGA 

(2) INFORMATION FOR SEQ ID NO: 299: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 299 
CGCTCTGTGG GTGAGCTTCA 

(2) INFORMATION FOR SEQ ID NO: 300: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 300 
TGTGG7ATAG CCCAATTACA 

<2) INFORMATION FOR SEQ ID NO: 301: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 301 
AGGGTGCTGA GTGAGTAGTA 

(2) INFORMATION FOR SEQ ID NO: 302: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 302 
TTCTTTTCAG GCCCTCGTGT 

(2) INFORMATION FOR SEQ ID NO: 303: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 03 
TGCTGACCCG GTATGGTGGT 

(2) INFORMATION FOR SEQ ID NO: 304: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) .TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 304 
TTT6GTGCAG CCTGTGACTC 

(2) INFORMATION FOR SEQ ID NO: 305: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:305 
CGCACACAAG GTCAGTGTTC 

(2) INFORMATION FOR SEQ ID NO: 306: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 06 
TCTTTCCCAG GTTACTCCTT 

(2) INFORMATION FOR SEQ ID NO: 307: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 307 
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ATCAAAGACT GTAAGTAACC 

(2) INFORMATION FOR SEQ ID NO:308: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:308 
TCTATTTCAG ATGCTGATTC 

(2) INFORMATION FOR SEQ ID NO: 309: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 309 
AGTAGAACAA GTAA6TGCAG 

(2) INFORMATION FOR SEQ ID NO: 310: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:310 
TTTTCAAAAG GCCTCCAAAG 

(2) INFORMATION FOR SEQ ID NO: 311: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 311 
GAGCCCTGAG GTAAGTTAAT 

(2) INFORMATION FOR SEQ ID NO: 312: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 312 

GCTTTTTCAG ATACTACTAT 

(2) INFORMATION FOR SEQ ID NO: 3 13: 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH; 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 313 
TAACATGTTC AACTGTCTGT 

(2) INFORMATION FOR SEQ ID NO: 3 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 314 
TGTTATATGC ATTTATCTTC 

(2) INFORMATION FOR SEQ ID NO: 315: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:315 
GGTAAATGAG GTAAGTCCTG 

(2) INFORMATION FOR SEQ ID NO: 3 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 16 
TCTTGTTAAG ATCGCTCTCT 

(2) INFORMATION FOR SEQ ID NO: 317: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 317 
CCTTGCCCAG GTTCTCTTAA 

(2) INFORMATION FOR SEQ ID NO: 318: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 318 
GCAATCGCAC CTGCACACCC 

(2) INFORMATION FOR SEQ ID NO:319: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 319 
ACTGCCCATT TCTGGTAAAG 

(2) INFORMATION FOR SEQ ID NO: 320: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 320 
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CCCCTAACAG ATCATGATTC 

(2) INFORMATION FOR SEQ ID NO: 321: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 321 
ACGTGCAATG GTAAGAGGGC 

(2) INFORMATION FOR SEQ ID NO: 322: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 322 
TGTTTTGCAG TTTCCAGTGG 

(2) INFORMATION FOR SEQ ID NO: 323: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 323 
AAGTGGAACG GTGACTCTCT 

(2) INFORMATION FOR SEQ ID NO: 324: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 324 
TCCTTCACAG GCCAGTGCAG 

(2) INFORMATION FOR SEQ ID NO: 325: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 325 
GAACAAACTG GTGAGTAGTA 

(2) INFORMATION FOR SEQ ID NO: 326: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 326 
TTTTTTGTAG AGCCTTCCAT 

(2) INFORMATION FOR SEQ ID NO:327: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 327 
AGCACAGTAG GTAACTAACT 

(2) INFORMATION FOR SEQ ID NO: 328: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 328 
ATGGCCACAG ATTTGTTGGA 

(2) INFORMATION FOR SEQ ID NO: 329: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 



-183- 



wo 99/37809 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 329 
CTTCCTGTTG GTAAGCTGTC 

(2) INFORMATION FOR SEQ ID NO: 330: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 330 
TTCTCCTTAG CAGAGTCACC 

(2) INFORMATION FOR SEQ ID NO: 331: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 331 
AAAAAGCACA GTAAGTTGGC 

(2) INFORMATION FOR SEQ ID. NO: 332: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 332 
TTTTCATCAG ACCCGAGAGG 

(2) INFORMATION FOR SEQ ID NO: 333: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 333 
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GAGCTATGAG GTGAGGAGTT 

(2) INFORMATION FOR SEQ ID NO: 334: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 334 
TTTGTTACAG ATATTACTAC 

(2) INFORMATION FOR SEQ ID NO:335: 

(i) SEQUENCE CH7VRACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 5 
AGCCTGGAAA TGCGTGTTTC 

(2) INFORMATION FOR SEQ ID NO:336: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 336 
CGAGAATTCA CTCGAGCATC AGG 

(2) INFORMATION FOR SEQ ID NO:337: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQtJENCE DESCRIPTION: SEQ ID NO: 337 
CCTGATGCTC GAGTGAATTC T 



-185- 



wo 99/37809 

(2) INFORMATION FOR SEQ ID NO: 338: 



PCTAJS98/01260 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 848 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 1. . .848 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:338: 

ATG ATT CTG GAA GGA AGT GGT GTA ATG AAT CTC AAC CCA GCC AAC AAC 48 
Met He Leu Glu Gly Ser Gly Val Met Asn Leu Asn Pro Ala Asn Asn 
15 10 15 

CTC CTT CAC CAG CAA CCA GCC TGG CCG GAC AGC TAC CCC ACA TGC AAT 96 
Leu Leu His Gin Gin Pro Ala Trp Pro Asp Ser Tyr Pro Thr Cys Asn 
20 25 30 

GTT TCC AGC GGT TTT TTT GGA AGC CAG TGG CAT GAA ATC CAC CCT CAG 144 
Val Ser Ser Gly Phe Phe Gly Ser Gin Trp His Glu He His Pro Gin 
35 40 45 

TAC TGG ACC AAA TAC CAG GTG TGG GAA TGG CTG CAG CAC CTC CTG GAC 192 
Tyr Trp Thr Lys Tyr Gin Val Trp Glu Trp Leu Gin His Leu Leu Asp 
50 55 60 

ACC AAC CAG CTA GAC GCT AGC TGC ATC CCT TTC CAG GAG TTC GAC ATT 240 
Thr Asn Gin Leu Asp Ala Ser Cys He Pro Phe Gin Glu Phe Asp He 
65 70 75 80 

AGC GGA GAA CAC CTG TGC AGC ATG AGT CTG CAG GAG TTC ACG AGG GCA 288 
Ser Gly Glu His Leu Cys Ser Met Ser Leu Gin Glu Phe Thr Arg Ala 
85 90 95 

GCA GGC TCA GCT GGG CAG CTG CTC TAC AGC AAC CTA CAG CAT CTC AAG 336 
Ala Gly Ser Ala Gly Gin Leu Leu Tyr Ser Asn Leu Gin His Leu Lys 
100 105 110 

TGG AAC GGC CAA TGC AGC AGT GAC CTT TTC CAG TCC GCA CAC AAT GTC 384 
Trp Asn Gly Gin Cys Ser Ser Asp Leu Phe Gin Ser Ala His Asn Val 
115 120 125 

ATT GTC AAG ACT GAA CAA ACC GAT CCT TCC ATC ATG AAC ACA TGG AAA 432 
He Val Lys Thr Glu Gin Thr Asp Pro Ser He Met Asn Thr Trp Lys 
130 135 140 

GAA GAA AAC TAT CTC TAT GAT CCC AGC TAT GGT AGC ACA GTA GAT CTG 480 
Glu Glu Asn Tyr Leu Tyr Asp Pro Ser Tyr Gly Ser Thr Val Asp Leu 
145 150 155 160 
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TTG GAC AGT AAG ACT TTC TGC CGG GOT CAG ATC TCC ATG ACA.ACC TCC 528 
Leu Asp Ser Lys Thr Phe Cys Arg Ala Gin lie Ser Met Thr Thr Ser 
165 170 175 

AGT CAC CTT CCA GTT GCA GAG TCA CCT GAT ATG AAA AAG GAG CAA GAC 576 
Ser His Leu Pro Val Ala Glu Ser Pro Asp Met Lys Lys Glu Gin Asp 
180 185 190 

CAC CCT GTA AAG TCC CAC ACC AAA AAG CAC AAC CCA AGA GGC ACT CAC 624 
His Pro Val Lys Ser His Thr Lys Lys His Asn Pro Arg Gly Thr His 
195 200 205 

TTA TGG GAG TTC ATC CGA GAC ATT CTC TTG AGC CCA GAC AAG AAC CCA 672 
Leu Trp Glu Phe He Arg Asp He Leu Leu Ser Pro Asp Lys Asn Pro 
210 215 220 

GGG CTG ATC AAA TGG GAA GAC CGT TCG GAA GGC ATC TTC AGG TTC CTG 720 
Gly Leu lie Lys Trp Glu Asp Arg Ser Glu Gly He Phe Arg Phe Leu 
225 230 235 240 

AAG TCA GAA GCT GTG GCT CAG CTG TGG GGG AAA AAG AAA AAT AAC AGT 768 
Lys Ser Glu Ala Val Ala Gin Leu Trp Gly Lys Lys Lys Asn Asn Ser 
245 250 255 



AGC ATG ACA TAC GAG AAG CTC AGC CGG GCT ATG AGA TAT TAC TAC AAA 816 
Ser Met Thr Tyr Glu Lys Leu Ser Arg Ala Met Arg Tyr Tyr Tyr Lys 
260 265 270 



CGA GAA ATC CTG GAA CGT GTG GAT GGA CGA CG 848 
Arg Glu lie Leu Glu Arg Val Asp Gly Arg Arg 
275 280 



(2) INFORMATION FOR SEQ ID NO: 33 9: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 283 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 339: 

Met He Leu Glu Gly Ser Gly Val Met Asn Leu Asn Pro Ala Asn Asn 

15 10 15 

Leu Leu His Gin Gin Pro Ala Trp Pro Asp Ser Tyr Pro Thr Cys Asn 

20 25 30 

Val Ser Ser Gly Phe Phe Gly Ser Gin Trp His Glu He His Pro Gin 

35 40 45 

Tyr Trp Thr Lys Tyr Gin Val Trp Glu Trp Leu Gin His Leu Leu Asp 

50 55 60 

Thr Asn Gin Leu Asp Ala Ser Cys He Pro Phe Gin Glu Phe Asp He 
65 70 75 80 

Ser Gly Glu His Leu Cys Ser Met Ser Leu Gin Glu Phe Thr Arg Ala 
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85 90 95 

Ala Gly Ser Ala Gly Gin Leu Leu Tyr Ser Asn Leu Gin His Leu Lys 

100 105 110 

Trp Asn Gly Gin Cys Ser Ser Asp Leu Phe Gin Ser Ala His Asn Val 

115 120 125 

lie Val Lys Thr Glu Gin Thr Asp Pro Ser lie Met Asn Thr Trp Lys 

130 135 140 

Glu Glu Asn Tyr Leu Tyr Asp Pro Ser Tyr Gly Ser Thr Val Asp Leu 
145 150 155 160 

Leu Asp Ser Lys Thr Phe Cys Arg Ala Gin lie Ser Met Thr Thr Ser 

165 170 175 

Ser His Leu Pro Val Ala Glu Ser Pro Asp Met Lys Lys Glu Gin Asp 

180 185 190 

His Pro Val Lys Ser His Thr Lys Lys His Asn Pro Arg Gly Thr His 

195 200 205 

Leu Trp Glu Phe He Arg Asp He Leu Leu Ser Pro Asp Lys Asn Pro 

210 215 220 

Gly Leu He Lys Trp Glu Asp Arg Ser Glu Gly He Phe Arg Phe Leu 
225 230 235 240 

Lys Ser Glu Ala Val Ala Gin Leu Trp Gly Lys Lys Lys Asn Asn Ser 

245 250 255 

Ser Met Thr Tyr Glu Lys Leu Ser Arg Ala Met Arg Tyr Tyr Tyr Lys 

260 265 270 

Arg Glu He Leu Glu Arg Val Asp Gly Arg Arg 
275 280 
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What is Claimed is: 

1. An isolated nucleic acid molecule comprising a sequence within a 
mammalian ASTH1 locus, or a polymorphic variant thereof. 

5 2. An isolated nucleic acid molecule according to Claim 1 , wherein said 

nucleic acid molecule encodes an ASTH1 polypeptide. 

3. An isolated nucleic acid molecule according to Claim 1 wherein said 
nucleic acid comprises a promoter or regulatory region. 

10 

4. An isolated nucleic acid molecule according to Claim 1 comprising a 
probe for detection of an ASTH1 locus polymorphism. 

5. An array of oligonucleotides comprising: 
1 5 two or more probes according to Claim 4. 

6. An isolated nucleic acid comprising a microsatellite repeat associated 
with a predisposition to asthma. 

20 7. A nucleic acid according to any of claim 1 to 5, wherein said ASTH1 

locus is human. 

8. A cell comprising a nucleic acid composition according to any of 
claims 1 to 4. 

25 

9. A purified polypeptide composition comprising at least 50 weight % of 
the protein present as the product of the nucleic acid of Claim 1. 

10. A method for detecting a predisposition to asthma in an Individual, the 
30 method comprising: 

analyzing the genomic DNA or mRNA of said individual for the presence of at 
least one predisposing ASTH1 locus polymorphism or a sequence linked to a 
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predisposing polymorphism; wherein the presence of said predisposing 
polymorphism is indicative of an increased susceptibility to asthma. 

11. A method according to Claim 10, wherein said analyzing step 
5 comprises detection of specific binding between the genomic DNA or mRNA of said 
individual with a probe or probes according to either of Claims 4 or 5. 



12. A method according to Claim 10, wherein said analyzing step 
comprises detection of specific binding between the genomic DNA or mRNA of said 

10 individual with a microsatellite marker listed in Table 1 . 

13. A non-human transgenic animal model for ASTH1 gene function 
comprising one of; 

(a) a knockout of an ASTH1 gene; 
15 (b) an exogenous and stably transmitted mammalian ASTH1 gene 

sequence; or 

(c) an ASTH1 promoter sequence operably linked to a reporter gene. 



14. A method of screening for biologically active agents that modulate 
20 ASTH1 function, the method comprising: 

combining a candidate biologically active agent with any one of: 

(a) a mammalian ASTH1 polypeptide; 

(b) a cell comprising a nucleic acid encoding a mammalian ASTH1 
polypeptide; or 

25 (c) a non-human transgenic animal model for ASTH1 gene function 

comprising one of: (i) a knockout of an ASTH1 gene; (ii) an exogenous and stably 
transmitted mammalian ASTH1 gene sequence; or (iii) an ASTH1 promoter 
sequence operably linked to a reporter gene; and 

determining the effect of said agent on ASTH1 function. 
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1 5. An isolated nucleic acid tfiat hybridizes under stringent conditions to 
any one of: SEQ ID N0:1, SEQ ID NO:2, SEQ ID N0:3, SEQ ID N0:4, SEQ ID 
NO:6, SEQ ID N0:8, SEQ ID NO:10, or SEQ ID NO:328. 



5 16. An isolated nucleic acid that encodes a polypeptide or fragment . 

thereof having an amino acid sequence substantially Identical to the sequence as 
set forth within any one of SEQ ID NO:5, SEQ ID NO:7, SEQ ID N0:9, SEQ ID 
NO:11,orSEQID NO:339. 
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